You must not circulate this work in any other form
and you must impose this same condition on any acquirer
Published in the United States of America by Oxford University Press 198 Madison Avenue, New York, NY 10016, United States of America
British Library Cataloguing in Publication Data
Data available
Library of Congress Control Number: 2024946925
ISBN 9780192867407
ISBN 9780192867414 (pbk.)
DOI: 10.1093/oso/9780192867407.001.0001
Printed and bound by CPI Group (UK) Ltd, Croydon, CR0 4 YY
Cover image: The authors
Links to third party websites are provided by Oxford in good faith and for information only. Oxford disclaims any responsibility for the materials contained in any third party website referenced in this work.
Preface
I saw Eternity the other night
Like a great Ring of pure and endless light, All calm as it was bright,
And round beneath it, Time in hours, days, years Driv'n by the spheres
Like a vast shadow mov'd. In which the world
And all her train were hurl'd.
Henry Vaughan (1621-1695) The World
You sometimes speak of gravity as essential and inherent to matter. Pray do not ascribe that notion to me, for the cause of gravity is what I do not pretend to know and therefore would take more time to consider of it.
Sir Isaac Newton (1642-1726) Letter to Richard Bentley
Albert Einstein's crowning theoretical achievement was his formulation of his general theory of relativity, a theory of gravity that superseded Isaac Newton's approach and transformed our view of the Universe. It took a century for one of its key predictions, the existence of gravitational waves, to be verified, but it is a measure of how persuasive its underlying principles have been that no-one seriously doubted that gravitational waves would eventually be detected. General relativity engages profoundly with the nature of space and time (even more so than Henry Vaughan did in his magnificent poem quoted above) and provides the key ideas missing from Newton's theory (the deficiency of which Newton was keenly aware, as indicated in the second quotation). Our point of view in writing this book is that everyone should have the opportunity to engage with this beautiful theory which, conceptually, is based on simple ideas from the physics of fields. The mastery of the machinery of general relativity does however require some facility with mathematics that is likely to be unfamiliar to many students of physics, but the payoff is so great and the material so stimulating that we hope the reader will join us in exploring one of the greatest achievements in physics.
The text follows the same approach as our earlier Quantum Field Theory for the Gifted Amateur (QFTGA) (OUP, 2014) and this is perhaps a good moment to restate what we mean by the slightly tongue-incheek term 'gifted amateur'. We are not writing for the mathematically uninitiated and do assume that our reader has a background in physics. However, we are not writing for experts either and aim to provide an entry point to a profound topic that we hope readers will find both entertaining and useful. The use of the term 'gifted amateur' encourages
the potential reader to have a go for themselves, conveying the feeling that a difficult subject is open to those who considered themselves nonexperts. We adopt the same approach used in writing QFTGA, dividing the material into short and easily digestible chapters, spelling out mathematical steps in worked examples and illustrating the arguments with hand-drawn figures. In response to reader requests from QFTGA, we also include a large number of problems with worked solutions. We have both had a long-standing fascination with the subject and a conviction that it belongs more centrally in the physics curriculum. However, we have not lost the memory of finding some of this material difficult and so hope that our book will give a curious reader a more patient and illuminating guide to this subject that they would find in some of the more weighty and established tomes.
The first two parts of the book introduce the main concepts that lead to the formulation of the Einstein field equations, and this material concludes with an outline description of the most important implications of the theory. These implications are worked out in much more detail in the middle section of the book, the third part covering cosmology and the fourth part detailing the consequences for orbits and black holes. General relativity is a theory about the geometry of spacetime and a more mathematical treatment of geometry is given in the fifth part of the book for those with an appetite to explore these aspects in more detail. The final part of the book returns to field theory, framing general relativity as a classical field theory and looking forward to how it might be formulated as a quantum field theory in a future theory of quantum gravity.
In writing this volume we are particularly grateful to the following individuals who have helped us: Rodrigo Alonso, Nathan Bentley, Katherine Blundell, Theo Breeze, Harvey Brown, Andrei Constantin, Felix Flicker, Martin Galpin, Matjaž Gomilšek, Thomas Hicken, Ben Huddart, Ifan Hughes, Baojiu Li, Guillaume Mahler, and Trevor Wishart. These individuals have been generous with their time and have helped improve the book, but any errors that are found post-publication will be posted on the book's website: http://tomlancaster.webspace.durham.ac.uk/grgabook
We are also grateful to our copy editor Aravind Kannankara. Finally, we thank Cally, Eden, and Katherine for their patience, love, and support.
TL & SJB
Contents
0 Overture1
0.1 What is relativity?1
0.2 What is general relativity?3
0.3 What is a metric? ..... 5
0.4 What are we building on? ..... 6
0.5 Who is this book for?8
0.6 Units in this book
10
Exercises
11
I Geometry and mechanics in flat spacetime
1 Special relativity ..... 12
1.1 A common sense start ..... 12
1.2 The speed of light ..... 13
1.3 Light cones and the Lorentz transformation ..... 15
1.4 Paths through spacetime ..... 18
1.5 Experiments ..... 19
Exercises ..... 21
2 Vectors in flat spacetime ..... 22
2.1 Vectors ..... 23
2.2 Coordinate transformations ..... 24
2.3 Examples of vectors ..... 27
2.4 Principle of least action
34
Exercises ..... 34
3 Coordinates ..... 36
3.1 Coordinates in Euclidean space ..... 36
3.2 Farewell to the position vector ..... 39
3.3 Non-Euclidean space ..... 40
Exercises ..... 41
4 Linear slot machines ..... 43
4.1 Dot products and down vectors ..... 44
4.2 Vectors and 1-forms ..... 46
4.3 Transformations ..... 49
4.4 Tensors ..... 50
4.5 Energy-momentum tensor ..... 52
Exercises ..... 55
5 The metric ..... 56
5.1 Metrics in general ..... 56
5.2 Meet some metrics ..... 58
5.3 Light and light cones ..... 60
5.4 Lengths, areas, volumes ..... 62
Exercises ..... 65
II Curvature and general relativity ..... 67
6 Finding a theory of gravitation ..... 68
6.1 Free fall and the equivalence principle ..... 68
6.2 Why general relativity? ..... 73
6.3 A differential equation to describe gravity ..... 75
6.4 Local flatness ..... 76
6.5 Time dilation in a gravitational field ..... 77
Exercises ..... 79
7 Parallel lines and the covariant derivative ..... 81
7.1 Parallelism ..... 82
7.2 Derivatives and connections ..... 83
7.3 The covariant derivative ..... 85
7.4 Parametrized paths ..... 86
7.5 Enter the metric ..... 88
Exercises ..... 89
8 Free fall and geodesics ..... 91
8.1 Extremal intervals ..... 91
8.2 A geodesic equation ..... 95
8.3 Inertial forces ..... 96
8.4 Geodesics for photons ..... 98
Exercises ..... 99
9 Geodesic equations and connection coefficients ..... 101
9.1 Finding connection coefficients ..... 101
9.2 The geodesic equation from the action ..... 104
Exercises ..... 105
10 Making measurements in relativity ..... 108
10.1 Observers and their observations ..... 108
10.2 Coordinate and non-coordinate bases ..... 110
10.3 The orthonormal frame ..... 114
10.4 Freely falling frames ..... 116
Exercises ..... 118
11 Riemann curvature and the Ricci tensor ..... 120
11.1 What is curvature? ..... 120
11.2 Tidal forces ..... 121
11.3 Riemann curvature ..... 124
11.4 Symmetries of the Riemann tensor ..... 126
11.5 The Ricci tensor and Ricci scalar ..... 127
11.6 Example computations ..... 128
11.7 Geodesic deviation revisited ..... 129
Exercises ..... 130
12 The energy-momentum tensor ..... 131
12.1 Another look at the energy-momentum tensor ..... 131
12.2 Example energy-momentum tensors ..... 132
12.3 Classical particles ..... 134
12.4 Conservation laws ..... 136
Exercises ..... 140
13 The gravitational field equations ..... 141
13.1 Geometry: a recap of the key ingredients ..... 141
13.2 Physics: the key ingredients ..... 142
13.3 An incorrect guess ..... 145
13.4 Einstein's field equation ..... 147
Exercises ..... 150
14 The triumphs of general relativity ..... 151
14.1 Weak fields and the Newtonian limit ..... 151
14.2 Gravitational waves ..... 153
14.3 Stars, trajectories, and orbits ..... 155
14.4 Cosmology ..... 156
III Cosmology ..... 157
15 An introduction to cosmology ..... 158
15.1 The cosmological principle ..... 159
15.2 The Hubble flow ..... 160
15.3 Cosmic time ..... 162
15.4 Universe 0 : an empty universe ..... 163
15.5 Universe 1: flat and expanding ..... 164
Exercises ..... 167
16 Robertson-Walker spaces ..... 169
16.1 Spaces with constant curvature ..... 169
16.2 Three Robertson-Walker spaces ..... 173
16.3 Redshift and cosmic expansion ..... 176
16.4 The initial singularity ..... 178
Exercises ..... 179
17 The Friedmann equations ..... 181
17.1 Enter energy-momentum ..... 182
17.2 Enter thermodynamics ..... 183
17.3 Dust and radiation ..... 184
Exercises ..... 187
18 Universes of the past and future ..... 188
18.1 Spatially flat universes ..... 188
18.2 Curved universes with Lambda=0\Lambda=0 ..... 191
18.3 Einstein, Lemaître and Eddington ..... 192
18.4 A brief history of model universes ..... 196
Exercises ..... 199
19 Causality, infinity, and horizons ..... 201
19.1 Penrose diagrams ..... 202
19.2 The de Sitter spacetime ..... 209
19.3 Big-Bang singularities ..... 211
Exercises ..... 214
IV Orbits, stars, and black holes ..... 217
20 Newtonian orbits ..... 218
20.1 Kepler's laws ..... 219
20.2 Anatomy of an orbit ..... 220
20.3 Effective potentials ..... 222
20.4 Allowed trajectories ..... 223
20.5 The why? of orbits ..... 225
Exercises ..... 227
21 The Schwarzschild geometry ..... 229
21.1 Justifying the solution ..... 230
21.2 Components of the Riemann tensor ..... 231
21.3 A gravitating object ..... 232
21.4 The meaning of the coordinates ..... 234
Exercises ..... 235
22 Motion in the Schwarzschild geometry ..... 237
22.1 Constants of the motion ..... 238
22.2 Gravitational redshift ..... 239
22.3 Motion in Schwarzschild spacetime ..... 240
22.4 Example: the radial plunge ..... 242
Exercises ..... 244
23 Orbits in the Schwarzschild geometry ..... 246
23.1 Orbits for massive particles ..... 246
23.2 Stable circular orbits ..... 248
23.3 Precession of the perihelion ..... 249
Exercises ..... 252
24 Photons in the Schwarzschild geometry ..... 254
24.1 Photon trajectories ..... 254
24.2 Looking around ..... 258
Exercises ..... 261
25 Black holes ..... 262
25.1 The surface r=2Mr=2 M ..... 264
25.2 The tortoise coordinate ..... 265
25.3 Death of an astronaut ..... 266
25.4 Looking around near a black hole ..... 267
25.5 Gravitational collapse ..... 268
Exercises ..... 270
26 Black-hole singularities ..... 272
26.1 Singularities ..... 272
26.2 Eddington-Finkelstein coordinates ..... 275
Exercises ..... 279
27 Kruskal-Szekeres coordinates ..... 280
27.1 Enter the Kruskal metric ..... 280
27.2 Wormholes ..... 284
27.3 Another Penrose diagram ..... 285
Exercises ..... 287
28 Hawking radiation ..... 289
28.1 Hawking radiation ..... 289
28.2 Black-hole thermodynamics ..... 292
Exercises ..... 296
29 Charged and rotating black holes ..... 297
29.1 Charged black holes ..... 297
29.2 Kerr black holes ..... 299
29.3 Interacting with the Kerr geometry ..... 304
Exercises ..... 306
V Geometry ..... 307
30 Classical curvature ..... 308
30.1 Curvature of a line ..... 308
30.2 Curvature with vectors ..... 310
30.3 Two-dimensional surfaces ..... 312
30.4 Gauss' equation ..... 314
30.5 Intrinsic and extrinsic curvature ..... 316
30.6 Riemann's project ..... 318
Exercises ..... 320
31 A reintroduction to geometry ..... 322
31.1 Old notions of vectors and gradients ..... 323
31.2 Vectors and vector fields ..... 324
31.3 Linear slot machines again ..... 327
31.4 Tensors again ..... 329
31.5 Examples of tensor operations ..... 330
Exercises ..... 332
32 Differential forms ..... 334
32.1 2-forms ..... 334
32.2 p-forms ..... 336
32.3 p-vectors ..... 337
Exercises ..... 339
33 Exterior and Lie derivatives ..... 340
33.1 Exterior calculus ..... 340
33.2 Commutators ..... 342
33.3 Lie derivatives of vectors ..... 344
33.4 Lie derivatives of tensors ..... 347
33.5 Killing vectors ..... 348
Exercises ..... 350
34 Geometry of the connection ..... 351
34.1 Covariant derivative in pictures ..... 352
34.2 Connection and exterior derivative ..... 353
34.3 Covariant derivative of tensors ..... 355
34.4 The metric revisited ..... 358
Exercises ..... 361
35 Riemann curvature revisited ..... 363
35.1 Geodesic deviation (slight return) ..... 363
35.2 Components of the curvature tensor ..... 366
35.3 Parallel transport again ..... 368
35.4 The meaning of the Ricci tensor ..... 370
Exercises ..... 372
36 Cartan's method ..... 374
36.1 Connection 1-forms ..... 374
36.2 Two rules ..... 377
36.3 Le repère mobile ..... 379
36.4 Example computations ..... 380
Exercises ..... 385
37 Duality and the volume form ..... 386
37.1 Motivation: 2-forms and flux ..... 386
37.2 Hodge star operation ..... 387
37.3 Volume forms ..... 392
Exercises ..... 395
38 Forms, chains, and Stokes' theorem ..... 397
38.1 Integration ..... 397
38.2 Integrating over forms ..... 400
38.3 Anatomy of an integral ..... 401
38.4 Boundaries and chains ..... 404
38.5 Stokes' theorem ..... 405
Exercises ..... 408
VI Classical and quantum fields ..... 411
39 Fluids as dry water ..... 412
39.1 Euler's equation ..... 413
39.2 Energy and Bernoulli's equation ..... 415
39.3 Energy-momentum tensor ..... 418
39.4 Relativistic fluids ..... 420
Exercises ..... 425
40 Lagrangian field theory ..... 428
40.1 Matter fields ..... 429
40.2 Action and equations of motion ..... 430
40.3 Fields in curved spacetime ..... 433
40.4 Motivating the Einstein equation ..... 434
40.5 Energy-momentum tensor ..... 437
40.6 Noether's theorem ..... 438
40.7 The perfect fluid ..... 440
Exercises ..... 443
41 Inflation ..... 445
41.1 Symmetry breaking ..... 446
41.2 Effective potentials ..... 449
41.3 Why flat? ..... 451
Exercises ..... 452
42 The electromagnetic field ..... 453
42.1 Electric charge in a field ..... 453
42.2 Faraday tensor and Maxwell equations ..... 455
42.3 Gauge freedom ..... 458
42.4 Geometrical electromagnetism ..... 460
Exercises ..... 464
43 Charge conservation and the Bianchi identity ..... 467
43.1 Conserving electric charge ..... 467
43.2 Electromagnetic gauge field ..... 469
43.3 Gravitational curvature ..... 471
Exercises ..... 475
44 Gauge fields ..... 476
44.1 Fibre bundles and gauge invariance ..... 476
44.2 Parallel transport and field strength ..... 480
Exercises ..... 483
45 Weak gravitational fields ..... 485
45.1 The Newtonian limit ..... 485
45.2 Linearized theory of gravitation ..... 487
45.3 Exploiting gauges ..... 488
Exercises ..... 492
46 Gravitational waves ..... 494
46.1 Waves in a gauge theory ..... 494
46.2 Lorenz gauge for gravitational waves ..... 496
46.3 Quadrupolar radiation ..... 501
46.4 Radiated energy and power ..... 503
46.5 An exact solution ..... 505
46.6 The discovery of gravitational waves ..... 506
Exercises ..... 509
47 The properties of gravitons ..... 512
47.1 Force-carrying particles ..... 512
47.2 Photon propagation and polarization ..... 514
47.3 Graviton propagation and polarization ..... 516
Exercises ..... 519
48 Higher dimensional spacetime ..... 520
48.1 Gauge transformations in five dimensions ..... 521
48.2 Unifying electromagnetism and gravitation ..... 522
Exercises ..... 525
49 From classical to quantum gravity ..... 527
49.1 Extra dimensions ..... 527
49.2 String theory ..... 530
49.3 Parametrizing the string ..... 532
49.4 Strings in relativity ..... 534
49.5 Superspace ..... 536
49.6 Loop quantum gravity ..... 537
49.7 Anti-de Sitter spacetime ..... 539
49.8 Our current best guess ..... 542
Exercises ..... 545
50 The Big-Bang singularity ..... 547
50.1 Facts about Euclidean geometry ..... 547
50.2 Orthogonal geodesics in spacetime ..... 548
50.3 Our Universe ..... 551
Exercises ..... 552
A Further reading ..... 554
B Conventions and notation ..... 562
B. 1 Electromagnetic units ..... 562
B. 2 Vectors, 1-forms and tensors ..... 562
B. 3 Covariant derivatives ..... 564
C Manifolds and bundles ..... 565
C. 1 Preliminaries ..... 566
C. 2 Maps and functions ..... 567
C. 3 One-to-one, into, and onto ..... 567
C. 4 Continuous maps ..... 568
C. 5 Manifolds, coordinates, and charts ..... 569
C. 6 Functions on the manifold ..... 571
C. 7 Differentiation on the manifold ..... 572
C. 8 Compact regions ..... 575
C. 9 Curves ..... 575
C. 10 Tangent spaces
578
578
C. 11 Fibre bundles ..... 578
D Embedding ..... 581
Exercises ..... 586
E Answers to selected problems ..... 587
Index ..... 614
Overture
Our Theory of Gravitation is as good as perfect: Lagrange, it is well known, has proved that the Planetary System, on this scheme, will endure forever; Laplace, still more cunningly, even guesses that it could not have been made on any other scheme.
Thomas Carlyle (1795-1881) Sartor Restartus
General relativity is one of the most profound statements in science. It is a theory of gravity that allows us to model the large-scale structure of the Universe; to understand and explain the workings of black holes; to reveal how gravity interacts with light waves and even how the Universe hosts its own, gravitational, waves. It is central to our notions of where the Universe comes from and what its eventual fate might be. The theory's conception was largely the work of one remarkable scientist. ^(1){ }^{1} General relativity is often viewed as a fearsomely difficult theory whose mastery is a rite of passage into the world of advanced physics. However, as we will show, the theory is based on simple principles which are straightforward to grasp. This initial chapter will outline the path we will take through the book and will introduce some important bits of jargon. We start with the word relativity.
0.1 What is relativity?
Newton's ^(2){ }^{2} first law states that a body with no force acting on it will move in a straight line with a uniform velocity. This statement would be true if viewed in any inertial reference frame ('inertial' here means that the reference frame, which defines the coordinates used, is not accelerating). There are lots of inertial reference frames to choose from (all moving at different speeds and in different directions with respect to each other), but in all of them, Newton's first law holds. Even before Einstein came on the scene it was possible to formulate a principle of relativity:
The principle of relativity:
Physical laws are the same in all inertial reference frames.
This implies that there is no absolute rest frame in Newtonian physics. ^(3){ }^{3} Any inertial reference frame will do, and we then have to describe motion relative to the inertial reference frame we have chosen.
0.1 What is relativity?
1
0.2 What is general relativity?
3
0.3 What is a metric?
5
0.4 What are we building on?
6
0.5 Who is this book for?
8
0.6 Units in this book
9
Exercises
10
0.1 What is relativity? 1
0.2 What is general relativity? 3
0.3 What is a metric? 5
0.4 What are we building on? 6
0.5 Who is this book for? 8
0.6 Units in this book 9
Exercises 10| 0.1 What is relativity? | 1 |
| :--- | ---: |
| 0.2 What is general relativity? | 3 |
| 0.3 What is a metric? | 5 |
| 0.4 What are we building on? | 6 |
| 0.5 Who is this book for? | 8 |
| 0.6 Units in this book | 9 |
| Exercises | 10 |
Exercises
^(1){ }^{1} Albert Einstein (1879-1955). ^(3)A{ }^{3} \mathrm{~A} rest frame of a particle is that frame of reference in which a particle is measured to be at rest
Fig. 1 Juggling is best performed in (a) an inertial reference frame, or one which is accelerating constantly, rather than (b) one which has a time-varying acceleration a(t)a(t).
Example 0.1
Physical processes follow simple laws in inertial frames, because we can then apply Newton's laws in their simplest form.
A juggler will prefer to carry out their juggling when they are standing on a fixed floor [Fig. 1(a)]. They are then in an inertial rest frame and the juggler can effectively calculate the parabolic Newtonian trajectories of all the balls in his or her head, just assuming the effect of gravity.
However, you can juggle a set of balls equally well on a moving train or in a moving plane, as long as you are travelling at a constant velocity (i.e. that you are in an inertial frame). The same Newtonian laws apply as before. Einstein's special theory of relativity is concerned with these inertial frames of reference.
In fact, juggling will also be possible if the train or plane is in a state of constant acceleration. In that case, the juggler would not be in an inertial frame but the uniform acceleration would be indistinguishable from an additional gravitational field, and the juggler would be able to correct for this effect without difficulty, again using Newtonian laws. This idea is at the root of the equivalence principle that underlies Einstein's general theory of relativity.
Juggling is very difficult though if the acceleration is rapidly time-varying (i.e. if the train suddenly jolts forward or shakes backwards and forwards) because additional time-varying forces would then act on the balls [Fig. 1(b)].
The principle of relativity is, in effect, a symmetry principle. It tells us that physics works in the same way, however we choose our coordinates, as long as our coordinates are described relative to an inertial reference frame. We can transform from one inertial set of coordinates to another by rotating, translating, or even what we will call 'boosting'. A boost is a transformation to another coordinate system moving with uniform velocity with respect to the initial one.
Example 0.2
An example of the independence of physics to boosts that is familiar to many is the sensation one experiences when seated on a train in a station and observing a neighbouring train moving forward. For a moment, you might think that your train is moving backward, and you need to check some fixed object on the station platform before you are sure which situation has occurred, and in effect whether your train is still in the station reference frame or in a new boosted reference frame.
The principle of relativity has been understood for a long time; Newton and Galileo accepted it. As we will explore in more detail in Chapter 1, Einstein's first revolutionary step, made in 1905, was to add an additional postulate:
The principle of invariant light speed:
As measured in any inertial reference frame, light propagates in empty space with a definite speed cc, that is independent of the state of motion of the emitting body.
This principle has all manner of strange consequences that form the basis of Einstein's special theory of relativity. Why is it special? Because it is a theory that focuses on inertial reference frames and ignores gravity. Thus, it is restricted to some special (but important) cases. A good physical theory is said to be covariant if it transforms sensibly ^(4){ }^{4} under coordinate transformations. Special relativity is a theory which is covariant with respect to translations, rotations, reflections, and boosts. The boosts have to be carried out consistently with respect to the principle of invariant light speed and we will see in Chapter 1 that this must be carried out using a Lorentz transformation. Thus special relativity is said to be a theory which possesses Lorentz covariance.
Special relativity tells us that nothing can go faster than light. Thus, on a spacetime diagram, that is, a graph with time running up the page with spatial coordinates perpendicular, an observer's future and past can be represented as being inside a forward and backward light cone (see Fig. 2). Anything the observer can do now (throw a stone, shine a torch) can only influence the region of spacetime inside, or on, the forward light cone; anything that influences the observer now (the appearance of the night sky, an assassin's bullet) can only originate from inside, or on, the backward light cone. Moreover, if we populate spacetime with lots and lots of observers at different points, each will have their own light cone and all these light cones will be oriented in the same way [see Fig. 3(a)]. We shall see that this is a description of what is known as flat spacetime and is the situation that we assume to hold in special relativity.
(a)
(b)
Fig. 3 (a) The light cones in flat spacetime all line up at different points, like soldiers on parade. (b) The light cones in curved spacetime look much more disorderly, as if some of the soldiers on parade now have too much alcohol in their bloodstream.
0.2 What is general relativity?
This book is about Einstein's general theory of relativity in which gravity is described. To understand the significance of what Einstein did, it is helpful to first take a step back. Newton constructed a theory of gravity, ^(4){ }^{4} Clearly the notion of a 'sensible' transformation requires some explanation. For now, it can be thought of as the requirement that equations take the same form after transformation. This implies that no new terms should appear in an equation upon transformation to a different system of coordinates.
Fig. 2 The light cone in a spacetime diagram. Time is plotted vertically and the horizontal plane represents two of the three orthogonal spatial directions. An observer at the origin has the potential to influence any event inside her forward light cone and be influenced by any event inside her backward light cone. ^(5)G{ }^{5} G is the gravitational constant 6.6741 xx10^(-11)Nkg^(-2)m^(2)6.6741 \times 10^{-11} \mathrm{~N} \mathrm{~kg}^{-2} \mathrm{~m}^{2}, measured first by Henry Cavendish (1731-1810) in 1798. ^(6){ }^{6} Hence, the attitude of Thomas Carlyle in the quotation (written in 1836) that opened this chapter. ^(7){ }^{7} Newton knew this, as can be seen in the quotation heading the Preface to this book on page v. Newton has described the force produced by a distant mass, but a real force was felt to require a cause, and Newton couldn't come up with one. In the 1717 preface to his book on 'Opticks', he stated that he would not be taking gravity as an es would not be taking gravity as an es sential property of matter because he didn't know its cause because 1 am no yet satisfied about it for want of experiments'. ^(8){ }^{8} John Archibald Wheeler (1911-2008) ^(9){ }^{9} General relativity is a classical field theory. By classical we mean that the theory is not compatible with quantum mechanics. The search for a quantum theory of gravitation is still ongoing, a matter we will return to in Chapter 49 ^(10){ }^{10} By matter fields we mean those fields describing massive particles or massive fluids, and also those describing phenomena such as electromagnetism, which is represented by a field with en ergetic, but massless, excitations.
published in his Principia in 1686, which meant that for the first time it was possible to appreciate that the same force that caused the Moon to orbit the Earth also caused the famous (and probably apocryphal) apple to fall from the tree. Newtonian gravity could be described by an equation, F=GMm//r^(2)F=G M m / r^{2}, relating ^(5){ }^{5} the magnitude of the force FF between masses MM and mm separated by distance rr. This inverse-square relationship beautifully explains the elliptical motions of the planets and led to many people thinking that gravity was a done deal. ^(6){ }^{6} However, there was a fly in the ointment. Newton had shown how gravity behaves, but he had not explained what it was. ^(7){ }^{7} Mechanical explanations were popular in the seventeenth century (it was, after all, the golden age of clockwork mechanisms) and in Newton's theory of gravity it was not possible to see where the gear wheels were located in this theory; there was no mechanism, no machinery, just an influence teleporting itself through empty space; it made no sense. What was transmitting the gravity through space? And what even was gravity anyway? It took Einstein's genius to realize that gravity isn't something that just gets transmitted through space. Space, or more accurately spacetime, is a structural property of the gravitational field, with the curvature in the very fabric of spacetime itself [see Fig. 3(b)] being directly determined by the matter within it. In the beautiful phrase coined by Wheeler ^(8){ }^{8} :
Spacetime tells matter how to move; matter tells spacetime how to curve.
General relativity is a field theory that describes gravity. A field is a machine that takes a position in spacetime and outputs an object representing the amplitude of something at that point in spacetime. The amplitude could be a scalar, a vector, a tensor etc. ^(9){ }^{9} Field theories describe matter, such that we speak of the electromagnetic field as describing light and charges, of particle fields as describing elementary particles and of the fluid field as describing the dynamics of continuous fluids. General relativity tells us that the effects we call gravitational reflect the energy content of all of the matter fields ^(10){ }^{10} in the Universe. What makes general relativity unique as a field theory is that the energy of these matter fields, and hence gravitation itself, is inextricably linked to another, very special, field: the metric field that describes the geometry of space and time.
The clearest expression of how general relativity describes gravitation is the Einstein equation. This may be written conceptually as
{:(1)((" Curvature of ")/(" spacetime "))=((" Energy density ")/(" of matter fields ")).:}\begin{equation*}
\binom{\text { Curvature of }}{\text { spacetime }}=\binom{\text { Energy density }}{\text { of matter fields }} . \tag{1}
\end{equation*}
The left-hand side of the Einstein equation is geometrical. The curvature is a geometrical property of space and time that follows from the metric field. The right-hand side of the Einstein equation is physical and reflects fields that describe the content of the Universe.
In formulating general relativity, Einstein began from this intuition, but initially struggled with the details of how curvature can be described
mathematically using geometrical techniques that were unfamiliar to him. In the century since Einstein's monumental work, there has been a great deal of progress in both the techniques and presentation of geometry, not least following the work of Élie Cartan, ^(11){ }^{11} but the reputation for difficulty that general relativity enjoys can still be traced back to the mathematical barrier this material presents to new students of gravitation. In fact, Einstein was helped by a friend, the mathematician Marcel Grossman, ^(12){ }^{12} to master geometry, but despite this, Einstein worked tirelessly for a further decade before the theory was complete. In this spirit of friendly help, the opening sections of this book are designed to help the gifted amateur understand the mathematical language of the lefthand (geometrical) side of the Einstein equation, but in due course we will fill in the details of both sides.
0.3 What is a metric?
General relativity concerns the metric field, but what is that? The metric field can be thought of as a set of rules that allow us to work out the distances and angles between points in space and time. The geometrical description links space and time so inseparably that we refer to them as a single entity ^(13){ }^{13} spacetime. The metric field then describes geometry by providing the distances and angles between points in spacetime, known as events. The metric itself can be expressed by writing down the metric line element which is an equation for the interval between two closely spaced events. This allows us to carry out thought experiments where we imagine that spacetime has various particular curvatures and then investigate the consequences.
Ancient Alexandria's great mathematician Euclid ^(14){ }^{14} was never able to prove his parallel postulate: the intuition that two lines that start parallel will continue to be parallel out to infinity. It was realized by geometers in the eighteenth and nineteenth centuries that this is only true for a flat plane and that consistent geometries where parallel lines converge or diverge are possible in curved spaces. The deviation of parallel lines from parallelism gives us a test for, and measure of, curvature. In other words, any relative motion of two small, uncharged test particles, set off at the same speeds on parallel paths, must be the consequence of a gravitational field. The information about curvature is encoded in the metric. Einstein's equation is a differential equation that, when solved for a distribution of matter, gives us access to a metric field.
The metric is a field because, in general, it varies throughout spacetime. That is to say we insert a position in spacetime into the metric field and we are returned with a metric that allows us to compute the distance between events in that part of spacetime. The left-hand side of the Einstein equation can be thought of as a differential equation describing the variation of the metric field in spacetime and hence we obtain our notion of the curvature of spacetime [see Fig. 3(b)]. ^(11){ }^{11} Élie Joseph Cartan (1869-1951). ^(12){ }^{12} Marcel Grossmann (1878-1936). ^(13){ }^{13} This notion of a single entity requires another conceptual leap: the coordinates in the metric are of no intrinsic significance. The symbol tt, to which we have grown accustomed for representing time, becomes less important. ^(14){ }^{14} Euclid (who lived around 300 BC ). ^(15){ }^{15} Why? The Newtonian theory is known to provide a good description of gravitation in many of the circum stances in which we encounter it, i.e the limit of small gravitational interactions and of particles travelling slowly compared to the speed of light.
Fig. 4 The gravitational field vec(g)( vec(r))\vec{g}(\vec{r}) around a particle of mass MM.
Fig. 5 The gravitational potential Phi(r)\Phi(r) (a scalar field) at a distance rr from a particle of mass MM at the origin. ^(16){ }^{16} Henry Cavendish used a torsion balance to measure the tiny gravitational attraction between lead spheres.
0.4 What are we building on?
General relativity supersedes Newton's theory of gravity, but the two theories should agree if the gravitational fields are weak. ^(15){ }^{15} Therefore, it is worth restating the older Newtonian theory: Newton asserted that the force vec(F)\vec{F} on a point mass mm at position vec(r)\vec{r} due to a point mass MM at the origin is given by the vector equation
{:(2) vec(F)=-(GMm)/(r^(2))* hat(vec(r))",":}\begin{equation*}
\vec{F}=-\frac{G M m}{r^{2}} \cdot \hat{\vec{r}}, \tag{2}
\end{equation*}
in which arrows denote three-dimensional vectors, and the minus sign expresses the fact that gravity is an attractive force. Since the force scales with the mass mm, we can define a gravitational field vector vec(g)\vec{g} as the force per unit mass, i.e.
and this is in fact a vector field vec(g)( vec(r))\vec{g}(\vec{r}) that depends on position vec(r)\vec{r}. For a point mass, we then have (see Fig. 4)
{:(4) vec(g)( vec(r))=-(GM( hat(vec(r))))/(r^(2)).:}\begin{equation*}
\vec{g}(\vec{r})=-\frac{G M \hat{\vec{r}}}{r^{2}} . \tag{4}
\end{equation*}
The gravitational field is a conservative field of force (meaning the net work done in moving a point mass around a closed loop is zero, basically that the work done in rolling a ball up a hill is equivalent to the energy liberated when it rolls back down again), and hence we can write it as the gradient of a scalar potential. Conventionally, we include a minus sign and so write
where Phi( vec(r))\Phi(\vec{r}) is a scalar field known as the gravitational potential. For the case of a point mass MM at the origin, Phi( vec(r))=-GM//r\Phi(\vec{r})=-G M / r (see Fig. 5). Gauss' theorem (to be discussed below) shows that if the mass at the origin is not point-like, but is spherically symmetric, then outside the radius of the mass distribution the same results still hold.
Example 0.3
For a test mass on the surface of Earth, the gravitational force F=mgF=m g, where g=g=9.81ms^(-2)9.81 \mathrm{~m} \mathrm{~s}^{-2}. Following the Cavendish experiment ^(16){ }^{16} of 1798 (and later improvement on it), the gravitational constant GG was measured to be 6.6741 xx10^(-11)Nkg^(-2)m^(2)6.6741 \times 10^{-11} \mathrm{~N} \mathrm{~kg}^{-2} \mathrm{~m}^{2}. Cavendish described this experiment as 'weighing the world' because we can then use eqn 4 to deduce that
where R_(o+)=6.378 xx10^(6)mR_{\oplus}=6.378 \times 10^{6} \mathrm{~m} is the radius of the Earth. This then gives the mass of the Earth as M_(o+)=5.97 xx10^(24)kgM_{\oplus}=5.97 \times 10^{24} \mathrm{~kg}. Here we are using the commonly used symbol o+\oplus to denote the Earth. With these two numbers, we can also work out the mean density of the Earth by dividing mass M_(o+)M_{\oplus} by volume (4)/(3)piR_(o+)^(3)\frac{4}{3} \pi R_{\oplus}^{3}, which yields a value of 5.5 xx10^(3)kgm^(-3)5.5 \times 10^{3} \mathrm{~kg} \mathrm{~m}^{-3}, just over a factor of 5 greater than water.
We can play a similar game with our nearest star, the Sun. The Earth's orbit around the Sun is elliptical, but it's not far from circular, so for an estimate we can equate the gravitational force on the Earth due to the Sun GM_(o.)M_(o+)//R_("ES ")^(2)G M_{\odot} M_{\oplus} / R_{\text {ES }}^{2} to the centripetal force M_(o+)v^(2)//R_(ES)M_{\oplus} v^{2} / R_{\mathrm{ES}}, where vv is the speed of the Earth, M_(o.)M_{\odot} is the mass of the Sun ( o.\odot being the symbol we use for denoting the Sun) and R_(ES)R_{\mathrm{ES}}, the separation of the Sun and Earth is called the astronomical unit (abbreviated A.U.). The value of R_(ES)R_{\mathrm{ES}} was first estimated by the Greeks by measuring the angle between a half-moon and the Sun (see Fig. 6), although subsequently improved in the seventeenth century and later by measuring the solar parallax using the transit of Venus. A modern value is 1.496 xx10^(11)m1.496 \times 10^{11} \mathrm{~m}. The period tau\tau of the circular orbit is related to vv and R_(ES)R_{\mathrm{ES}} by tau=2piR_(ES)//v\tau=2 \pi R_{\mathrm{ES}} / v, where our equating of centripetal and gravitational forces yields v^(2)=GM_(o.)//R_(ES)v^{2}=G M_{\odot} / R_{\mathrm{ES}}. We can hence deduce from tau=1\tau=1 year that M_(o.)=1.99 xx10^(30)kgM_{\odot}=1.99 \times 10^{30} \mathrm{~kg}. The density of the Sun, using R_(o.)=6.96 xx10^(8)mR_{\odot}=6.96 \times 10^{8} \mathrm{~m}, then works out to be around 1.4 xx10^(3)kgm^(-3)1.4 \times 10^{3} \mathrm{~kg} \mathrm{~m}^{-3}, just a bit larger than that of water.
One conclusion from all of this is that, from a nineteenth-century perspective, the idea of a black hole (an object with such intense surface gravity that even light could not escape) seems highly unlikely. The escape velocity v_("esc ")v_{\text {esc }} from a spherical object of radius RR, mass M=(4)/(3)pi rhoR^(3)M=\frac{4}{3} \pi \rho R^{3} and density rho\rho is simply worked out by equating the kinetic energy (1)/(2)mv_("esc ")^(2)\frac{1}{2} m v_{\text {esc }}^{2} of a launching test mass to the depth m|Phi|=GMm//Rm|\Phi|=G M m / R, of the potential energy well in which it starts its journey. This yields v_("esc ")=sqrt(2GM//R)=v_{\text {esc }}=\sqrt{2 G M / R}=Rsqrt(8pi G rho)R \sqrt{8 \pi G \rho}. This result scales linearly with RR and would reach v_(esc)=cv_{\mathrm{esc}}=c only when
{:(7)R=(c)/(sqrt(8pi G rho)).:}\begin{equation*}
R=\frac{c}{\sqrt{8 \pi G \rho}} . \tag{7}
\end{equation*}
Since the best-studied objects in the Universe were those in our own Solar System, and these have mean densities that don't exceed that of water by more than a factor of about five, and since the rest of the Universe seems to be filled with stars that look somewhat similar to the Sun, then eqn 7 would only be likely to be satisfied by an object with radius of more than an astronomical unit, the distance from the Earth to the Sun. No normal stars were thought to be this big. Thus, it didn't seem as if eqn 7 would hold. ^(17){ }^{17}
Because vec(g)\vec{g} points inwards to any point mass MM at the origin, we deduce that the integral of vec(g)\vec{g} over any spherical surface SS of radius RR surrounding the origin is
{:(8)int_(S) vec(g)*d vec(S)=-(GM)/(R^(2))*4piR^(2)=-4pi GM:}\begin{equation*}
\int_{S} \vec{g} \cdot \mathrm{~d} \vec{S}=-\frac{G M}{R^{2}} \cdot 4 \pi R^{2}=-4 \pi G M \tag{8}
\end{equation*}
The divergence theorem is a result from vector calculus and says that the left-hand side of this equation, a surface integral of the flux of the vector vec(g)\vec{g} out of the surface, can be rewritten as an integral over the volume of the divergence of vec(g)\vec{g}, written as vec(grad)* vec(g)\vec{\nabla} \cdot \vec{g}. Hence, we have
where here the volume element dV=d^(3)r\mathrm{d} V=\mathrm{d}^{3} r, i.e. the gravitational flux out of a surface is equal to the integral of the divergence of the gravitational field inside the volume enclosed by the surface. From this, we can deduce that ^(18){ }^{18}
{:(12) vec(grad)* vec(g)=-4pi GM delta( vec(r)):}\begin{equation*}
\vec{\nabla} \cdot \vec{g}=-4 \pi G M \delta(\vec{r}) \tag{12}
\end{equation*}
The Newtonian results for a point mass can be generalized for the field due to a distribution of mass since Newtonian theory is linear. Thus, for example,
Fig. 6 Diagram (not to scale) showing how the distance to the Sun can be estimated by measuring the angle between the Sun and the half-Moon. The distances are given in A.U. The calculation relies on an estimate of the distance to the Moon which can be estimated from measurements of lunar parallax.
^(17){ }^{17} As we shall see later, it is possible to have compact objects such as neutron stars which have enormous densities. This only became possible to understand after the development of quantum mechanics. ^(18){ }^{18} The Dirac delta function delta(x)\delta(x) is a function localized at the origin and which has integral unity. It is the perfect model of a localized particle, and is used here to fix the point mass MM at the origin. We have written a three-dimensional delta function delta( vec(r))-=delta(x)delta(y)delta(z)\delta(\vec{r}) \equiv \delta(x) \delta(y) \delta(z), often denoted delta^((3))( vec(x))\delta^{(3)}(\vec{x}). The integral of a dd-dimensional Dirac delta function delta^((d))( vec(x))\delta^{(d)}(\vec{x}) is given by
{:(10)intd^(d)xdelta^((d))( vec(x))=1:}\begin{equation*}
\int \mathrm{d}^{d} x \delta^{(d)}(\vec{x})=1 \tag{10}
\end{equation*}
It is defined by
{:(11)intd^(d)xf( vec(x))delta^((d))( vec(x))=f(0):}\begin{equation*}
\int \mathrm{d}^{d} x f(\vec{x}) \delta^{(d)}(\vec{x})=f(0) \tag{11}
\end{equation*}
^(19){ }^{19} An arbitrary distribution of mass can be written as an integral of point masses using
int_(S) vec(g)*d vec(S)=-4pi GM\int_{S} \vec{g} \cdot \mathrm{~d} \vec{S}=-4 \pi G M
where M=int rho( vec(r)^('))d^(3)r^(')M=\int \rho\left(\vec{r}^{\prime}\right) \mathrm{d}^{3} r^{\prime} is the total mass enclosed inside the surface SS, This result, which generalizes eqn 8 to an arbitrary distribution of mass, is often known as Gauss' theorem for gravitational fields. ^(20){ }^{20} In electrostatics, the force on charge qq is vec(F)=q vec(E)\vec{F}=q \vec{E} where vec(E)=- vec(grad)phi\vec{E}=-\vec{\nabla} \phi is the electric field and phi\phi is the electrostatic potential. Gauss' theorem for electrostatics is (in SI units)
where QQ is the charge enclosed by the surface SS, and vec(grad)* vec(E)=rho//epsilon_(0)\vec{\nabla} \cdot \vec{E}=\rho / \epsilon_{0} where rho\rho here is the charge density, and
is Poisson's equation. ^(21){ }^{21} Carlyle may have said that 'Our Theory of Gravitation is as good as perfect...' in the quote that opened the chapter, but this discrepancy turned out to be rather significant!
is the gravitational potential at position vec(r)\vec{r} from a distribution of masses with density rho( vec(r)^('))\rho\left(\vec{r}^{\prime}\right). The divergence of the gravitational field can then be written (generalizing eqn 12) as ^(19){ }^{19}
{:(14) vec(grad)* vec(g)( vec(r))=-4pi G rho( vec(r)).:}\begin{equation*}
\vec{\nabla} \cdot \vec{g}(\vec{r})=-4 \pi G \rho(\vec{r}) . \tag{14}
\end{equation*}
Equivalently, this can be written using the gravitational potential Phi\Phi, via vec(g)=- vec(grad)Phi\vec{g}=-\vec{\nabla} \Phi, to yield
{:(15)grad^(2)Phi=4pi G rho:}\begin{equation*}
\nabla^{2} \Phi=4 \pi G \rho \tag{15}
\end{equation*}
which is analogous to Poisson's equation in electrostatics. ^(20){ }^{20}
The dimensions of the gravitational potential Phi\Phi are (velocity) ^(2){ }^{2} so one might wonder what happens when |Phi||\Phi| becomes of the same order as c^(2)c^{2}, where cc is the speed of light. This would be equivalent to the size of the gravitational potential energy m|Phi|m|\Phi| of a mass mm becoming of the same order as the rest mass energy mc^(2)m c^{2}. This gives a rough criterion for when Newton's law of gravitation is likely to break down and the effects of general relativity to become extremely important. However, as we shall see in this book, the effects of general relativity can become significantly important, even before this point is reached.
Example 0.4
Effects such as gravitational time dilation are certainly measurable, if not dramatic, on the surface of planet Earth (where |Phi|⋘c^(2)|\Phi| \lll c^{2} ) and are important in accurate operation of the global positioning system (GPS).
The orbit around the Sun of Mercury, the innermost planet in the Solar System, gives Mercury an orbital speed larger than that of any other planet (though at 47kms^(-1)47 \mathrm{~km} \mathrm{~s}^{-1} it's less than 0.0002 c and so you wouldn't have thought relativistic effects would be that important). Its orbit axes slightly precesses around, by about 575 arcseconds per century, and most of this (about 532 arcseconds per century) is due to the gravitational effects of other bodies in the Solar system, perfectly calculable by Newtonian gravity. However, despite careful calculations, a discrepancy ^(21){ }^{21} of about 43 arcseconds per century spite careful calculations, a discrepancy ^(21){ }^{21} of about 43 arcseconds per century
stubbornly resisted explanation, until Einstein's general relativity came to the rescue.
0.5 Who is this book for?
As with our earlier book on quantum field theory, our imagined reader is an amateur. We have written this book for someone wanting to learn general relativity without (at least initially) joining the ranks of professional relativists; but (s)he is gifted, possessing a curious and adaptable mind and willing to embark on a significant intellectual challenge; (s)he has abundant curiosity about the physical world, a basic grounding in undergraduate physics, and a desire to be told an entertaining and intellectually stimulating story, but will not feel patronized if a few
mathematical niceties are spelled out in detail. We appreciate that some readers will want to get to the physical predictions of the theory as soon as possible, as their primary concern is with understanding what the Universe is actually like. Others will have more interest in the mathematical structure of the theory; such readers will want to know more about how some more advanced geometric formalism can yield additional insights. We have tried to cater for both types of readers and have designed the book so that it is possible to dip in and out of sections that may be more or less to a reader's taste, though we recommend all beginners to persevere with at least the first thirteen chapters.
The book is structured as follows. We begin in Part I with an introduction to the geometry of flat spacetime, reviewing special relativity and setting up the mathematics of the metric. Part II introduces the mathematics of curvature and sets up the physics of general relativity and finishes with the Einstein field equation. Part III applies these ideas to the Universe and studies various models used in cosmology. Part IV turns to smaller structures inside the Universe: stars, black holes and their orbits. Part V contains a more formal treatment of geometry which may be of more interest to those with more mathematical inclinations. Part VI considers general relativity as a type of field theory and examines how one might link the ideas in our best theory of gravitation to our most successful theories of quantum fields. Before we get going, we will say a few words about units.
0.6 Units in this book
Most readers will be familiar with SI units and we will begin the book using them. However, once we get going, we will switch over to what are known as geometrized units in which we set G=c=1.^(22)G=c=1 .{ }^{22} This has the great advantage of simplifying equations into more memorable forms since they will no longer be encumbered with unnecessary factors of cc and GG whose presence, to the experts, is 'obvious'. It of course has the great disadvantage of creating some confusion whenever a numerical result it needed, but after a bit of practice this does become second nature. Because of the potential confusion for newcomers to the field, we will frequently translate back to SI units (which we will tend to call 'real-world' units) when we need to. Here is an explanation of how to translate between the two systems.
Example 0.5
Conversion factors to convert from quantities expressed in real-world units into geometrized units can be computed by noting that the dimension ^(23){ }^{23} of cc is L//T\mathrm{L} / \mathrm{T} in the real world, while the dimension of G//c^(2)G / c^{2} is L//M\mathrm{L} / \mathrm{M}. To convert a quantity with realworld dimension time into geometrized units, multiply by cc. To convert a quantity with real-world dimension mass multiply by G//c^(2)G / c^{2}. Both of these quantities then have units of length in the geometrized system. ↷\curvearrowright It is certainly not necessary to read the book in order In fact, we would recommend skipping several sections on a first reading. Boxes like this one are intended to allow you to navigate a path through the text. ^(22){ }^{22} The reader will be let in gently to geometrized units. We will not begin to ometrized units. We will not begin to
set c=1c=1 until Chapter 2, and will not set G=1G=1 as well until Part III. In addition, Appendix B contains a summary of the units we use to discuss electromagnetism, along with a summary of useful notation. ^(23){ }^{23} We denote dimension of length by L , time by T and mass by M .
The generalized version of the above argument says that if a quantity has units L^(n)T^(m)M^(p)\mathrm{L}^{n} \mathrm{~T}^{m} \mathrm{M}^{p} in the real world, then it has units L^(n+m+p)\mathrm{L}^{n+m+p} in the geometrized system and the conversion factor is c^(m)(G//c^(2))^(p)c^{m}\left(G / c^{2}\right)^{p}. The table gives some examples.
Quantity real world geometrized conversion
Length L L 1
Time T L c
Mass M L G//c^(2)
Velocity LT 1 c^(-1)
Energy L^(-1)T^(-2)M L G//c^(4)
Energy density L^(-1)T^(-2)M L^(-2) G//c^(4)
Mass density L^(-3)M L^(-2) G//c^(2)
Pressure L^(-1)T^(-2)M L^(-2) G//c^(4)| Quantity | real world | geometrized | conversion |
| :--- | :---: | :---: | :---: |
| Length | L | L | 1 |
| Time | T | L | $c$ |
| Mass | M | L | $G / c^{2}$ |
| Velocity | LT | 1 | $c^{-1}$ |
| Energy | $\mathrm{L}^{-1} \mathrm{~T}^{-2} \mathrm{M}$ | L | $G / c^{4}$ |
| Energy density | $\mathrm{L}^{-1} \mathrm{~T}^{-2} \mathrm{M}$ | $\mathrm{L}^{-2}$ | $G / c^{4}$ |
| Mass density | $\mathrm{L}^{-3} \mathrm{M}$ | $\mathrm{L}^{-2}$ | $G / c^{2}$ |
| Pressure | $\mathrm{L}^{-1} \mathrm{~T}^{-2} \mathrm{M}$ | $\mathrm{L}^{-2}$ | $G / c^{4}$ |
If you want to convert an equation expressed in geometrized units into real-world units, multiply the quantities in the table by their respective conversion factors.
Example 0.6
In geometrized units, the Einstein equation is G=8pi T\boldsymbol{G}=8 \pi \boldsymbol{T}, where G\boldsymbol{G} and T\boldsymbol{T} are tensors that we will define later in the book (and should not be confused with the gravitational constant GG and temperature TT ). The left-hand side of the Einstein equation has units L^(-2)\mathrm{L}^{-2}, and the right-hand side has units of energy density ( L^(-1)T^(-2)M\mathrm{L}^{-1} \mathrm{~T}^{-2} \mathrm{M} ). The left-hand side is multiplied by unity; the right by G//c^(4)G / c^{4} and we obtain
(0.1) Show using Newtonian theory that the escape velocity from the surface of a star of mass MM and radius rr is v_("esc ")=sqrt(2GM//r)=sqrt(2|Phi|)v_{\text {esc }}=\sqrt{2 G M / r}=\sqrt{2|\Phi|}. Show that the condition v_("esc ")=cv_{\text {esc }}=c will occur if r=2GM//c^(2)r=2 G M / c^{2}, which is known as the Schwarzschild radius
(0.2) Estimate the surface gravity gg and the escape velocity v_("esc ")v_{\text {esc }} for (i) the surface of the Earth (R_(o+)=:}\left(R_{\oplus}=\right.6.378 xx10^(6)m,M_(o+)=5.97 xx10^(24)kg6.378 \times 10^{6} \mathrm{~m}, M_{\oplus}=5.97 \times 10^{24} \mathrm{~kg} ), (ii) the surface of the Sun(R_(o.)=6.96 xx10^(8)(m),M_(o.)=:}\operatorname{Sun}\left(R_{\odot}=6.96 \times 10^{8} \mathrm{~m}, M_{\odot}=\right. 1.99 xx10^(30)kg1.99 \times 10^{30} \mathrm{~kg} ), and (iii) the surface of a 1.4M_(o.)1.4 M_{\odot} neutron star with radius 10 km .
(0.3) Evaluate the tidal force (the difference in gravitational forces from one end [head] to the other [feet]) on a 1.8 m tall human being (i) standing on the Earth, (ii) at the Schwarzschild radius of a 3M_(o.)3 M_{\odot} black hole with her body aligned in a radial direction, and (iii) the same as (ii) but for a 10^(6)M_(o.)10^{6} M_{\odot} black hole.
Part I
Geometry and mechanics in flat spacetime
In this introductory part of the book, we trace the development of the picture of the Universe which underpins relativity.
Based upon the principle that light travels at cc in all inertial frames, we describe the geometry of spacetime in Chapter 1 and show that the consequences that stem from this are surprisingly far-reaching.
In Chapter 2, we show how vectors are treated in special relativity and how the dynamics of particles in flat spacetime can be obtained from the principle of least action.
Chapter 3 is concerned with coordinates. Sometimes we choose a geometric, coordinate-free approach, but often we have to choose a particular coordinate system. We consider Cartesian and nonCartesian bases and how to transform from one to the other.
We introduce tensors in Chapter 4, describing them as 'linear slot machines' into which you insert a number of vectors and their dual objects which are called 1 -forms; the slot machine then spits out a number. Vectors and 1 -forms are themselves both tensors, as is the energy-momentum tensor which we also introduce.
In Chapter 5, we consider a very special tensor: the metric tensor. The metric tensor encodes information about the spacetime, how distance is measured and also whether the spacetime is curved.
1
1.1 A common sense start 12
1.2 The speed of light 13
1.3 Light cones and the Lorentz transformation
1.5 Experiments
Chapter summary
Exercises
Fig. 1.1 Spacetime diagram for our naive conception of the past, present and future. In particular, the present 'now' is represented by a horizontal line. ^(1){ }^{1} There might need to be a few calculations made to correct for light-travel-time-effects (estimating the time delay in getting signals from you and your aunt to the space station), but after doing this it will make perfect sense for everyone to talk about those two sandwich-biting events occurring at precisely the same instant. ^(2){ }^{2} Although for the latter case we will need to be sent a signal of when the train left Paris, and will have to make a correction for the time taken for the signal to get to us.
Special relativity
Nowadays most people die of a sort of creeping common sense, and discover when it is too late that the only things one never regrets are one's mistakes.
Oscar Wilde (1854-1900) The picture of Dorian Gray
1.1 A common sense start
Einstein revolutionized our thinking about reality. To appreciate why, let's start with confirming some basic, obvious notions that would be selfevident to anyone who hadn't been exposed to Einstein's ideas. These are so straightforward that they might seem unnecessary to state, but we will do so because they turn out, in fact, to be wrong.
(1) The notion of now: For a start, we all understand how time rolls on inexorably for all of us. We all live in 'now', we leave the 'past' behind, and march into the 'future'. This is something we all experience, and as we look out of the window we see what others in the world are doing right now. Of course, if we train our telescopes on a distant galaxy, we might be observing it as it was, several billion years ago. But that's just a light-travel-time-effect. We can sensibly talk about what the inhabitants of the Andromeda galaxy might be doing right now, even if we can't see them. We could draw a spacetime diagram of this picture of reality and it would look like the one in Fig. 1.1.
(2) The notion of simultaneity: Because time is a quantity that we all experience identically (we all march to the same beat of the drum) you can make statements about simultaneity, such as 'at the exact same moment that I took my first bite of the sandwich in London, my aunt in Melbourne took the first bite of her sandwich'. We expect this statement to be universally true, agreed upon by all observers, so that if it is true for you and your aunt, it will be true for an observer of whatever standpoint (even if they are on the international space station). ^(1){ }^{1}
(3) Time intervals: Next, if we measure the time that something lasts, like a particular journey from Paris to Strasbourg, then we will get the same answer whether we are on the train or standing at Strasbourg station. ^(2){ }^{2} Moreover, the rate at which time elapses surely doesn't depend on your altitude above sea level. It would be ridiculous for time to go at a different rate on the top floor of a building
than at the basement. Time intervals are therefore something that everyone can agree on, irrespective of their frame of reference.
(4) Spatial intervals: Moreover, intervals in space are similarly universal. If you measure the length of a moving train carriage as a passenger you should get the same answer as an observer standing on the station platform. ^(3){ }^{3} Again, completely self-evident.
These concepts are all intuitively obvious. It was Einstein's particular genius to understand that, amazingly, our 'common sense' intuition is at fault and that these supposedly self-evident concepts must be abandoned.
1.2 The speed of light
By the start of the twentieth century, physicists were faced with a series of rather profound questions about how light propagates that put many accepted notions of physics at risk. ^(4){ }^{4} Einstein was motivated by wanting to save Maxwell's equations of electromagnetism which showed that the speed of light, cc, could be related to electric and magnetic constants via the famous equation linking cc to free space's permittivity epsilon_(0)\epsilon_{0} and permeability mu_(0)\mu_{0}
But speed is a relative quantity. A car travels at 50 miles per hour with respect to the road. What does light travel with respect to? If you are travelling at speed c//2c / 2 with respect to a laser which emits a beam of light travelling in opposite direction to you, does the light travel with respect to you at a speed (c)/(2)-(-c)=(3c)/(2)\frac{c}{2}-(-c)=\frac{3 c}{2} ? If you then measured the speed of light to be (3c)/(2)\frac{3 c}{2}, how could you reconcile that with Maxwell's equations? Einstein concluded that eqn 1.1 was a universal concept and that the speed of light was the same for all observers in all inertial reference frames. ^(5){ }^{5} The consequences of this bold assumption on spacetime geometry are far-reaching. Before we get to these, let's start with a review of some notions of ordinary geometry.
Example 1.1
The two-dimensional xyx y plane is shown in Fig. 1.2(a). The point (x,y)(x, y) is a distance d=sqrt(x^(2)+y^(2))d=\sqrt{x^{2}+y^{2}} from the origin. If we rotate the coordinates [Fig. 1.2(b)] so that x rarrx^(')x \rightarrow x^{\prime} and y rarry^(')y \rightarrow y^{\prime}, we want this distance to be unchanged, so that x^(2)+y^(2)=x^('2)+y^('2)x^{2}+y^{2}=x^{\prime 2}+y^{\prime 2}. A linear transform that accomplishes this is given by
which works because cos^(2)theta+sin^(2)theta=1\cos ^{2} \theta+\sin ^{2} \theta=1. The matrix in this equation is known as a rotation matrix. If you ask what are the set of points which are equidistant from the origin then, obviously, you will end up with concentric circles centred on the origin. The shortest distance between the origin and a point (x,y)(x, y) is, of course, a straight line and that straight line will intersect with all of those circles at right angles. ^(3){ }^{3} It is easier to make the measurement as a passenger on the train (just run a very long tape measure from one end to the other). On the platform, you would need to measure where the front of the moving train and the back of the train are at some simultaneous instant. Harder to do in practice, but perfectly possible in principle. You would naively expect to get the same answer in both cases. ^(4){ }^{4} Specifically, the assumption that light was a mechanical wave propagating in an ether raised several troubling issues. See the book by Cheng (Appendix A) for the history. ^(5){ }^{5} Reminder: An inertial reference frame, or inertial frame, is a reference frame that is not accelerating.
Fig. 1.2 The xyx y plane. The distance between a point (x,y)(x, y) and the origin is dd and is (of course) independent of whether the coordinates used are (a) xx and yy or (b) the rotated x^(')x^{\prime} and y^(')y^{\prime}.
Fig. 1.3 A light source flashes at the origin at t=0t=0 and a spherical wave front, with radius ctc t, expands outward. ^(6){ }^{6} We refer to points in spacetime as events. An event is something which happens at a particular place and particular time: a photon is emitted, a ticular time: a photon is emitted, a photon is absorbed, a gun is fired, a
balloon bursts. Each event is characterized by a single point in spacetime. ^(7){ }^{7} By d x^(2)x^{2} we really mean (dx)^(2)(\mathrm{d} x)^{2}, but we write this so often the convention is to leave out the brackets to save on notational clutter. ^(8){ }^{8} We will work with ds^(2)\mathrm{d} s^{2}, rather than taking the square root, in order to avoid dealing with square roots of negative numbers. For brevity, people sometimes refer to the square of the intervalds^(2)\mathrm{val} \mathrm{d} s^{2} as simply 'the interval' (even though strictly the term refers only to though strictly the term refers only to ds\mathrm{d} s ). Note also that in quantum field ds\mathrm{d} s ). Note also that in quantum field
theory it is conventional to define ds^(2)\mathrm{d} s^{2} theory it is conventional to define ds^(2)\mathrm{d} s^{2}
with the opposite sign to what we have done here, i.e. to write
and indeed we have done so in our own Quantum Field Theory for the Gifted Amateur. In this book, we adopt the convention used by most textbooks on general relativity. One might wish there was a common convention between the two fields, but this is the price you pay for exploring a number of topics in physics. It's the same with international motoring: you have to get used to driving on both the left and the right. ^(9){ }^{9} This follows from the fact, discussed above, that if ds=0\mathrm{d} s=0 in one inertial frame, then ds^(')=0\mathrm{d} s^{\prime}=0 in any other system. ^(10){ }^{10} That is, v_(12)=| vec(v)_(1)- vec(v)_(2)|v_{12}=\left|\vec{v}_{1}-\vec{v}_{2}\right|.
Let's now consider not just space but spacetime. As in the previous example, we want to have some notion of length which is unchanged under a rotation in spacetime (whatever that might mean). How do we define a length? Einstein's postulate gives us a clue, because if a light source flashes at x=y=z=0x=y=z=0 and t=0t=0 it will send out a beam of light travelling at speed cc in all directions. There will therefore be a spherical wave front (Fig. 1.3) defined by
Let's now consider two points in spacetime ^(6){ }^{6} which are separated only by infinitesimal distances but connected by a light pulse, so that ^(7){ }^{7}
Another way of writing this equation is to put the c^(2)dt^(2)c^{2} \mathrm{~d} t^{2} on the lefthand side so that the quantity that we will call ds^(2)\mathrm{d} s^{2}, the square of the spacetime interval or invariant line element ^(8){ }^{8} is
This has been written using the coordinates of some inertial frame we can call SS. In another inertial frame S^(')S^{\prime} our coordinates will change, but Einstein insists that light travels at the same speed in all inertial frames and so the interval between the same two events is given by
or ds^(2)=ds^('2)\mathrm{d} s^{2}=\mathrm{d} s^{\prime 2}. Remarkably, we can now show in the following example that the spacetime interval is the same in all inertial frames, even if the two points are not connected by a light pulse.
Example 1.2
For intervals separated by infinitesimal distances in space and time, ds^(2)\mathrm{d} s^{2} and ds^('2)\mathrm{d} s^{\prime 2} can be related using some function ^(9)a(v){ }^{9} a(v) by ds^(2)=a(v)ds^('2)\mathrm{d} s^{2}=a(v) \mathrm{d} s^{\prime 2}. Note that aa can't be a function of position or time without violating the principle of the homogeneity of spacetime (every point in spacetime is like any other point). The function a can depend on the velocity vec(v)\vec{v} between frames SS and S^(')S^{\prime}, but can't depend on the direction
depetime depend on the velocity vec(v)\vec{v} between frames SS and SS, but can't depend on the direction
of vec(v)\vec{v}, only on its magnitude v=| vec(v)|v=|\vec{v}|, otherwise it would violate the principle of the isotropy of space (no special directions). Now consider three frames: S,S_(1)S, S_{1}, which moves at a speed v_(1)v_{1} relative to SS, and S_(2)S_{2} which moves at a speed v_(2)v_{2} relative to SS. We have ds^(2)=a(v_(1))ds_(1)^(2)\mathrm{d} s^{2}=a\left(v_{1}\right) \mathrm{d} s_{1}^{2} and ds^(2)=a(v_(2))ds_(2)^(2)\mathrm{d} s^{2}=a\left(v_{2}\right) \mathrm{d} s_{2}^{2}, but we must also have ds_(1)^(2)=a(v_(12))ds_(2)^(2)\mathrm{d} s_{1}^{2}=a\left(v_{12}\right) \mathrm{d} s_{2}^{2}, where v_(12)v_{12} is the relative speed ^(10){ }^{10} of S_(1)S_{1} and S_(2)S_{2}. Comparing, we must have
However, v_(12)=sqrt(v_(1)^(2)+v_(2)^(2)-2v_(1)v_(2)cos theta)v_{12}=\sqrt{v_{1}^{2}+v_{2}^{2}-2 v_{1} v_{2} \cos \theta} where theta\theta is the angle between vec(v)_(1)\vec{v}_{1} and vec(v)_(2)\vec{v}_{2}. However, v_(12)=sqrt(v_(1)^(2)+v_(2)^(2)-2v_(1)v_(2)cos theta" where "theta" is the angle between "v_(1)" and "v_(2))v_{12}=\sqrt{v_{1}^{2}+v_{2}^{2}-2 v_{1} v_{2} \cos \theta \text { where } \theta \text { is the angle between } v_{1} \text { and } v_{2}}. a(v)a(v) cannot depend on vv and must be a constant (call it aa ). However, eqn 1.7 now becomes a=a//aa=a / a which is only true if a=1a=1. Thus we conclude that, in general,
This book is about general relativity in which spacetime can be curved, but for this chapter and the next three we will be considering the simplest case ^(11){ }^{11} of flat spacetime, in which the geometry considered above extends over all space. Thus, we can consider not just infinitesimal intervals ds but also the interval Delta s\Delta s between more distant points in spacetime, i.e. we can write
A sketch of a spacetime diagram near the origin for this flat spacetime is shown in Fig. 1.4. The set of points that satisfy Deltas^(2)=0\Delta s^{2}=0 are said to be on the light cone defined by eqn 1.5 since they can be connected to the origin by light rays. Points inside the light cone have Deltas^(2) < 0\Delta s^{2}<0 and can be connected to the origin by particles travelling less than the speed of light. The light cone actually contains two sections, a past light cone and a future light cone. Physical processes at the origin can be affected by anything on or within the past light cone and processes at the origin can affect anything on or within the future light cone. On the other hand, the set of points outside the light cone (which have Deltas^(2) > 0\Delta s^{2}>0 ) cannot be causally connected to the origin. We introduce some jargon for these three classes of interval:
Let's see how to transform between inertial frames. We shall deal with the xtx t plane as shown in Fig. 1.5. The point (x,t)(x, t) is now at an interval sqrt(-c^(2)t^(2)+x^(2))\sqrt{-c^{2} t^{2}+x^{2}} from the origin. The interval is somewhat like distance in Example 1.1, but the minus sign in the definition will change things. The analogue of rotating the coordinates, mapping x rarrx^(')x \rightarrow x^{\prime} and t rarrt^(')t \rightarrow t^{\prime}, which preserves the squared interval (-c^(2)t^(2)+x^(2)=-c^(2)t^('2)+x^('2))\left(-c^{2} t^{2}+x^{2}=-c^{2} t^{\prime 2}+x^{\prime 2}\right), is the linear transform given by
which works because cosh^(2)theta-sinh^(2)theta=1\cosh ^{2} \theta-\sinh ^{2} \theta=1. This is known as a Lorentz transformation. If frame S^(')S^{\prime} moves ^(12)^{12} at speed v-=beta cv \equiv \beta c with respect to frame SS, a particle located at a point in space which is stationary in SS, is moving in S^(')S^{\prime} at speed -v-v. If we then set x=0x=0 we have that x^(')=ct sinh thetax^{\prime}=c t \sinh \theta and t^(')=t cosh thetat^{\prime}=t \cosh \theta, but x^(')//t^(')=-vx^{\prime} / t^{\prime}=-v, so we deduce that v=-c tanh thetav=-c \tanh \theta, or equivalently beta=-tanh theta\beta=-\tanh \theta. This means that with the definition
(1.12)
we have that gamma=(1-beta^(2))^(-1//2)=(1-tanh^(2)theta)^(-1//2)=cosh theta\gamma=\left(1-\beta^{2}\right)^{-1 / 2}=\left(1-\tanh ^{2} \theta\right)^{-1 / 2}=\cosh \theta and beta gamma=\beta \gamma=-tanh theta cosh theta=-sinh theta-\tanh \theta \cosh \theta=-\sinh \theta. This puts the Lorentz transformation into the more familiar form ^(13){ }^{13}
^(11){ }^{11} This is all that was envisaged in Einstein's 1905 paper on special relativity.
Fig. 1.4 A spacetime diagram near the origin, showing points which are spacelike (Deltas^(2) > 0)\left(\Delta s^{2}>0\right) and timelike (Deltas^(2) < 0)\left(\Delta s^{2}<0\right) separated from the origin. The set of points with Deltas^(2)=0\Delta s^{2}=0 are on the light cone.
Fig. 1.5 The xtx t plane. ^(12){ }^{12} We define the quantity beta\beta using
^(14){ }^{14} In general it preserves the square of the interval -c^(2)t^(2)+x^(2)+y^(2)+z^(2)-c^{2} t^{2}+x^{2}+y^{2}+z^{2} in all frames, but here we are just considering one spatial dimension. ^(15){ }^{15} The proof is straightforward. Setting Deltat^(')=0\Delta t^{\prime}=0 in eqn 1.13 gives beta=\beta=-c Delta t//Delta x-c \Delta t / \Delta x which only has a sensible solution (|beta| < 1)(|\beta|<1) for spacelike intervals. ^(16){ }^{16} We are therefore forced to drop our common-sense points 1 and 2 from the start of the chapter
Fig. 1.6 A spacetime diagram in special relativity leads to a notion in which cial relativity leads to a notion in which
the region of spacetime outside the light cone is an extended present. ^(17){ }^{17} This also kills off common-sense point 4.
Fig. 1.7 The quantity gamma=(1-beta^(2))^(-1//2)\gamma=\left(1-\beta^{2}\right)^{-1 / 2} as a function of beta=v//c\beta=v / c.
The Lorentz transformation preserves the square of the interval ^(14){ }^{14}-c^(2)t^(2)+x^(2)-c^{2} t^{2}+x^{2} in all frames. Let's now state a couple of important consequences of this transformation for different types of intervals.
(1) For any two points separated by any spacelike interval, one can find a reference frame ^(15){ }^{15} for which their separation Deltat^(')=0\Delta t^{\prime}=0, i.e. the two events separated by that interval occur simultaneously. Therefore, one can think of the set of points outside the light cone as an extended present, a region of spacetime which is not causally connected to the origin but is potentially simultaneous to it (in some reference frame). We now realize that our notion of 'now' is not a horizontal plane in spacetime as in Fig. 1.1 but forms everything outside the light cone (see Fig. 1.6). Strangely we have access to our past and our future, but it is the extended present, the 'now', which we have no access to! Our notions of simultaneity have been dramatically altered. ^(16){ }^{16}
Example 1.4
Spacelike intervals can be measured using rulers. A ruler is a device for measuring a spacelike length Delta x\Delta x. (Length being the difference in two spatial coordinates evaluated spacelike length Delta x\Delta x. (Length being the difference in two spatial coordinates evaluated
at the same value of the time coordinate.) If the ruler is stationary in frame S^(')S^{\prime} and at the same value of the time coordinate.) If the ruler is stationary in frame SS and
has length LL then it doesn't matter when you measure the location of its two ends. has length LL then it doesn't matter when you measure the location of its two ends.
If the ruler is moving then it can still be used to measure distances but it is then critical you measure its two ends at the same time. Thus eqn 1.13 yields
and hence Delta x=L//gamma\Delta x=L / \gamma and the moving ruler is shorter than it is in its rest frame (remember, gamma >= 1\gamma \geq 1; see Fig. 1.7). This effect is known as Lorentz contraction. The ruler's length when it is stationary, LL, is called the rest length or proper length. ^(17){ }^{17}
(2) For two points separated by any timelike interval (which has negative Deltas^(2)\Delta s^{2} ), the straight-line path between those two points represents the longest 'distance' (i.e. interval) between them, so that small deviations from this path result in a shorter interval. This surprising result is related to the famous twin paradox and we will explore this in Example 1.6. Before that, we will explain how time is measured in special relativity.
Example 1.5
For a timelike interval Deltas^(2) < 0\Delta s^{2}<0 it is helpful to define a real quantity Delta tau\Delta \tau (with units of time) by
We call tau\tau the proper time because it yields the time in the rest frame of a particular particle; it is measured using a clock in that reference frame. In a general frame, we define the interval by eqn 1.9(Deltas^(2)-=-c^(2)Deltat^(2)+Deltax^(2)+Deltay^(2)+Deltaz^(2))1.9\left(\Delta s^{2} \equiv-c^{2} \Delta t^{2}+\Delta x^{2}+\Delta y^{2}+\Delta z^{2}\right), but by the invariance of the interval then
This demonstrates an effect known as time dilation, ^(18){ }^{18} showing that the time elapsed between two events is longest in the rest frame of a clock. This effect is sometimes remembered using the slogan 'moving clocks run slow'. This phrase sometimes causes confusion. Clocks run in their rest frames at a particular rate; it's just when viewed from reference frames in which the clocks are moving is it deduced that the clocks are slowed down. ^(19){ }^{19}
For any deviation from the straight-line path the elapsed time will be shorter because additional segments of spatial-like motion will reduce the value of the elapsed time. We can treat this in general using eqn 1.17 by writing the time elapsed tau\tau along a path in spacetime (between two points alpha\alpha and beta\beta ) as
Example 1.6
The ideas from the last example can be used to resolve the famous twin paradox. ^(21){ }^{21} Consider two twins A and B whose clocks are synchronized. Twin A remains on Earth, while twin B is briefly accelerated to speed vv and travels to Proxima Centauri at a distance x^(**)x^{*} from Earth (journey time x^(**)//vx^{*} / v in A's frame). B then is briefly deaccelerated and made to return home with velocity -v-v (arriving home after a total journey time of 2x^(**)//v2 x^{*} / v in A's frame). Both twins age at the same rate, according to their own individual clocks. However, when they meet at the end of the B's journey they find that twin A has aged more than twin B. From A's perspective, B's clock runs slow (time dilation), so that one hour experienced by B is gamma > 1\gamma>1 hours for A. But, couldn't B argue that from their perspective it was B that remained stationary and A did all the travelling? The resolution is A and B do not have identical experiences; while A has remained at rest in a single inertial frame, B has not, as the accelerometer in B's spacecraft will have recorded. Thus, there is no paradox because the situations are not symmetric. Because the interval is frame-independent, it suffices to work it out in A's frame (see Fig. 1.8). The straight-line path of A yields Deltas_(A)^(2)=-c^(2)(2x^(**)//v)^(2)\Delta s_{\mathrm{A}}^{2}=-c^{2}\left(2 x^{*} / v\right)^{2} corresponding to a total time of 2x^(**)//v2 x^{*} / v. The more circuitous path taken by BB has two segments, each of which has Deltas_(B)^(2)=x^(**2)-c^(2)(x^(**)//v)^(2)=-c^(2)(x^(**)//v gamma)^(2)\Delta s_{\mathrm{B}}^{2}=x^{* 2}-c^{2}\left(x^{*} / v\right)^{2}=-c^{2}\left(x^{*} / v \gamma\right)^{2}, leading to a total time interval of 2x^(**)//v gamma2 x^{*} / v \gamma, which is indeed a factor of gamma\gamma down from A's time interval (as we deduced from appreciating that B's clocks run slow in A's frame). The fact that B's world line (the path through spacetime) appears longer than A's world line in Fig. 1.8, and yet takes less time to travel, is all due to the minus sign in the expression for the interval. ^(18){ }^{18} The same result as in the previous example can also be obtained directly from the Lorentz transformation. Any timelike interval (involving Delta x,Delta t\Delta x, \Delta t ) can be turned into one involving zero spatial distance using an appropriate Lorentz transformation into a frame with beta=-Delta x//c Delta t\beta=-\Delta x / c \Delta t (so that Deltax^(')=0\Delta x^{\prime}=0 with beta=-Delta x//c Delta t\beta=-\Delta x / c \Delta t (so that Deltax^(')=0\Delta x^{\prime}=0, using eqn 1.13) leading to Delta tau-=Deltat^(')=\Delta \tau \equiv \Delta t^{\prime}=(Delta t)//gamma(\Delta t) / \gamma (in agreement with eqn 1.17). The squared interval for this is then Deltas_(-(c^(2)Deltat^(2))/(gamma^(2)))^(=)Deltax^(2)-c^(2)Deltat^(2)=-c^(2)Deltatau^(2)=\Delta s_{-\frac{c^{2} \Delta t^{2}}{\gamma^{2}}}^{=} \Delta x^{2}-c^{2} \Delta t^{2}=-c^{2} \Delta \tau^{2}= ^(19)A{ }^{19} \mathrm{~A} famous example is the cosmic ray muon which is generated in the upper atmosphere and makes it down to ground level. Muons have a lifetime of 2.2 mus2.2 \mu \mathrm{~s} in their rest frames which serves as their clock Even if they travelled t the speed of light, they should only the speed of ligh, they should only make it down from the atmosphere. The fact that many arrive on the ground is due to the time dilation effect; their clocks seem to be running slowly due to their high speed (large gamma\gamma ). In the muon's reference frame, the effect is due to the Lorentz contraction of the atmosphere _(20){ }_{20} which is rushing towards it! ^(20){ }^{20} Common-sense point 3 must now bite the dust too! ^(21){ }^{21} The twin paradox, as we shall see, is only an apparent paradox.
Fig. 1.8 A spacetime diagram for the twin paradox.
Fig. 1.9 A path through spacetime. ^(22){ }^{22} This clock may be a wristwatch, a radioactive source that measurably decays in activity as time increases, or may simply be the fact that the observer is slowly ageing.
Fig. 1.10 The function y(x)y(x) minimizes I. We consider small deviations from y(x)y(x) given by y(x)+epsilon eta(x)y(x)+\epsilon \eta(x) (dashed line) where eta(x)\eta(x) vanishes at x=ax=a and x=bx=b and epsilon\epsilon is a small parameter. ^(23){ }^{23} Though note that this condition will give us the extremal value of the integral, which could be a minimum or a maximum (or, and this becomes important in higher dimensions, a saddle point).
1.4 Paths through spacetime
We have seen that events are points in spacetime, and thus are instantaneous in time and localized in space. Events are witnessed and recorded by observers, who are each equipped with some kind of clock ^(22){ }^{22} which tracks the time in the observer's reference frame (i.e. measures the observer's proper time). The path the observer takes through spacetime (Fig. 1.9) is a chain of events connected by infinitesimal timelike intervals (the observer's speed through spacetime has to be less than cc ) and this path is known as the observer's world line.
We can now ask a simple question about paths through spacetime: what is the shortest distance between two points? This can be worked out using a technique in mathematics known as the calculus of variations and we review this in the following example for the simple case of usual flat (or Euclidean) space.
Example 1.7
In the calculus of variations, one deals with an integral of the form I=I=int_(a)^(b)F(y(x),y^(')(x),x)dx\int_{a}^{b} F\left(y(x), y^{\prime}(x), x\right) \mathrm{d} x, where y^(')=dy//dxy^{\prime}=\mathrm{d} y / \mathrm{d} x. We want to find the form of y(x)y(x) that minimizes II, while ensuring that y(a)y(a) and y(b)y(b) are fixed (see Fig. 1.10). The method assumes that you can make small variations to y(x)y(x) by adding a tiny bit of another function to it, so that
where epsilon\epsilon is a small number and eta(x)\eta(x) must vanish at x=ax=a and x=bx=b. Then we look for the condition ^(23){ }^{23}
{:(1.20)(dI)/((d)epsilon)|_(epsilon=0)=0quad" for all "eta(x):}\begin{equation*}
\left.\frac{\mathrm{d} I}{\mathrm{~d} \epsilon}\right|_{\epsilon=0}=0 \quad \text { for all } \eta(x) \tag{1.20}
\end{equation*}
which is known as the Euler-Lagrange equation.
We can now apply this to the case of the shortest distance between two points in Euclidean space. The length ℓ\ell of a path between two points is given by
The integrand FF is a function of y^(')y^{\prime}, not yy, and so del F//del y=0\partial F / \partial y=0 and we can work out that del F//dely^(')=-y^(')//sqrt(1+y^('2))\partial F / \partial y^{\prime}=-y^{\prime} / \sqrt{1+y^{\prime 2}}. The Euler-Lagrange equation then gives d//dx(y^(')//sqrt(1+y^('2)))=0\mathrm{d} / \mathrm{d} x\left(y^{\prime} / \sqrt{1+y^{\prime 2}}\right)=0 which is solved by y^(')=y^{\prime}= constant (let's call it mm ). The solution is then y=mx+cy=m x+c, where cc is another constant, and so is evidently a straight line.
We can now use this technique for working out the shortest interval between two points in spacetime (in the special case of a single spatial dimension). The proper time elapsed along a path between two points is
and so is a bit different from the Euclidean case. However, application of the Euler-Lagrange equation also gives a straight line solution. Here we have to remember that the Euler-Lagrange equation identifies a stationary solution and in this case the solution is a maximum, not a minimum. We can prove that very simply: consider a timelike interval between the origin (0,0)(0,0) and the point (x,t)(x, t). One can move to a frame ^(24){ }^{24} in which this interval is purely along the time axis, whereupon it becomes (0,t//gamma)(0, t / \gamma). The straight-line path thus corresponds to an elapsed time of t//gammat / \gamma. Any deviation from this straight line path will result in a shorter elapsed time because excursions along the xx-axis carry a reduced elapsed proper time because dtau=sqrt((dt)^(2)-(1)/(c^(2))(dx)^(2)". This is, of course, ")\mathrm{d} \tau=\sqrt{(\mathrm{d} t)^{2}-\frac{1}{c^{2}}(\mathrm{~d} x)^{2} \text {. This is, of course, }} the twin paradox all over again.
We will often parametrize paths using the proper time tau\tau as a way of recording how far along a path an observer has travelled. Of course, you can set the zero of proper time any way you wish, and you can measure time in units of seconds, hours, or months as you please. For this reason, any affine ^(25){ }^{25} scaling of tau\tau will do. An affine transformation of tau\tau can be written as
where aa and bb are real numbers and our new affine parameter lambda\lambda is just the proper time in different units with a different zero of time. We will have cause to use affine parameters later on when we tackle general relativity.
1.5 Experiments
In this chapter, we have outlined the consequences of Einstein's bold vision of 1905 that led to the formulation of special relativity. Why should we believe any of this? The answer is that this theory agrees spectacularly well with experiment, although the experiments were mostly all done after 1905. In this section, we briefly summarize some of these.
The speed of light is absolute and constant: The Michelson-Morley experiment (1887) demonstrated that the total time for light to ^(24)A{ }^{24} \mathrm{~A} frame moving with velocity -x//t-x / t, so that gamma\gamma is given by [1-(x//ct)^(2)]^(-1//2)\left[1-(x / c t)^{2}\right]^{-1 / 2}. ^(25){ }^{25} The word affine comes from the Latin affinis meaning 'related to' or connected with'. ^(26){ }^{26} Although justly lauded as a land mark experiment and known by Einstein, it is not clear that the MichelsonMorley experiment was a major influence on his thinking. The book by Cheng discusses the Einstein's motivations ^(27){ }^{27} A good example can be found in C. Braxmaier et al., Phys. Rev. Lett. 88, 010401 (2002). ^(28){ }^{28} C. W. Chou, D. B. Hume, T. Rosenband and D. J. Wineland, Science 329, 1630 (2010). ^(29)A{ }^{29} \mathrm{~A} review of recent results can be found in S. Liberati, Class. Quantum Grav. 30, 133001 (2013).
traverse, in free space, a distance ℓ\ell and to return back again is independent of its direction. This was accomplished by allowing light to travel back and forth along two perpendicular arms of equal length in a Michelson interferometer. ^(26){ }^{26} The Kennedy-Thorndike experiment (1932) was a modification in which the arms of the interferometer are of unequal length. This experiment shows the time for light to traverse a closed path is independent of not only the orientation of the apparatus but also its velocity. Modern versions ^(27){ }^{27} of this experiment frequently use two lasers, one locked to a well-known transition (such as a molecular absorption line, with frequency nu_("ref ")\nu_{\text {ref }} ) and the other locked to a very stable FabryPérot reference cavity (with frequency nu_("cav ")=nc(v)//(2ℓ)\nu_{\text {cav }}=n c(v) /(2 \ell), where nn is the mode number and ℓ\ell is the length of the cavity, the speed of light c(v)c(v) being allowed the possibility of depending on vv ). The difference between these two frequencies is measured precisely and monitored over time (as the laboratory velocity changes as the Earth rotates around the Sun).
Time dilation does occur: The Ives-Stillwell experiment (1938) used the Doppler shift in light from a moving source (accelerated ions) to infer time dilation. Time dilation is also used to interpret the flux of cosmic muons, as discussed earlier, though modern experiments use muon beams in accelerators (and from the muons' perspective, where the accelerator beamline is Lorentz contracted, this demonstrates Lorentz contraction). Modern Ives-Stillwell-type experiments have used heavy ion storage rings and laser spectroscopy to improve precision. A particularly elegant version ^(28){ }^{28} uses very slowly moving ions together with extremely accurate spectroscopy. Two optical clocks based on laser-cooled Al^(+)\mathrm{Al}^{+}ions are operated but in one of them the Al^(+)\mathrm{Al}^{+}ion is given a velocity by an applied static electric field. The frequency emitted by the two clocks can be measured (to an accuracy of 10^(-17)10^{-17} ) and accurately compared, providing agreement with Einstein's theory even though the velocity of one of the ions is only a rather sluggish ~~10ms^(-1)\approx 10 \mathrm{~ms}^{-1}.
Lorentz invariance holds: Numerous experiments have been performed to test Lorentz invariance to a high level of precision. No significant departure from Lorentz invariance has yet been found. ^(29){ }^{29}
Relativity has been used for more than a century: This is not a good argument, as Newtonian physics had survived unscathed for more than two centuries but was eventually superseded. However, we still find that Newtonian physics still has a very wide domain of applicability (and we now understand the limits of that domain). Relativity may one day meet its match (and we expect an as-yet unformulated theory of quantum gravity will take its place), but it has so far proved reliable in the design and operation of particle accelerators, the understanding of phenomena in astrophysics, telecommunications, the space programme and condensed matter
physics. The experiments we've described above have stringently tested many aspects of relativity, and we have now accumulated ample evidence that it works across many branches of physics.
Chapter summary
The speed of light is the same in all inertial frames. This has profound consequences for the nature of reality, including time dilation, length contraction, and a revolution in our notion of simultaneity and the meaning of the present.
In relativity we deal with events. The history of a particle, given in terms of events, forms its world line.
The square of the invariant interval ds^(2)\mathrm{d} s^{2} between two events will be identical, no matter which coordinate system is used to evaluate it.
The straight-line world line between two timelike separated points maximizes the interval. Deviations from this result in a smaller interval and hence elapsed time (which helps explain the twin paradox).
A light cone is defined by ds^(2)=0\mathrm{d} s^{2}=0.
The predictions of special relativity have been tested in detail and the theory is strongly supported by substantial experimental evidence.
Exercises
(1.1) Review the theory of special relativity and the derivations for the breakdown of simultaneity, the extended present, time dilation, the Lorentz contraction and the twin paradox. Give a critique of the 'common sense' statements in Section 1.1.
(1.2) The proper mean lifetime of a muon is 2.2 mus2.2 \mu \mathrm{~s}. Muons are formed in the upper atmosphere due to the collision of cosmic rays with molecules in the atmosphere. If such muons travel down to the Earth's surface with a speed of 0.995 c0.995 c, calculate their mean distance travelled before decaying (a) ignoring the effect of time dilation and (b) including the effect of time dilation.
(1.3) We would like to measure the interval Delta s\Delta s between events pp (on our world line) and qq (not on our world line), using only a clock and a light pulse. To do this we emit a light pulse at event rr which strikes event qq and is reflected back, meeting our world line at event uu. We measure the proper time interval between rr and pp, which we call tau_(2)\tau_{2}, and the proper time interval between pp and uu, which we call tau_(1)\tau_{1}. Show that Deltas^(2)=c^(2)tau_(1)tau_(2)\Delta s^{2}=c^{2} \tau_{1} \tau_{2}.
(1.4) The quantity (gamma-1)(\gamma-1) provides a measure of the difference between special-relativistic and Newtonian mechanics. What values of beta\beta are needed to obtain a value of (gamma-1)(\gamma-1) equal to (a) 0.01 , (b) 0.1 , (c) 1 , (d) 10 , (e) 100 ?
2
2.1 Vectors
2.3 Examples of vectors
Exercises
Fig. 2.1 We can't draw spacetime very accurately since it has 3+13+1 dimensions but here is an attempt. In this diagram, the three spatial dimensions have been flattened, unceremoniously, into a plane (shaded). The path of a photon (gamma)(\gamma) is also shown. ^(1){ }^{1} The convention used is to write the index as a superscript, i.e. it goes in the upstairs position. Keep an eye out for whether an index goes upstairs or downstairs because this will have a significance that we will explain later in this chapter. ^(2){ }^{2} In other words, they depend on the reference frame used.
Vectors in flat spacetime
Whether 'tis nobler in the mind to suffer
The slings and arrows of outrageous fortune,
Or to take Arms against a Sea of troubles ..
William Shakespeare (1564-1616) Hamlet (Act III, Scene I)
In special relativity, we are dealing with flat spacetime because gravity is ignored. Let's consider what kind of physical quantities might exist in such a spacetime (see Fig. 2.1). The first type we might think of is a scalar. A scalar is simply a number, and takes the same value in every inertial frame. It is thus said to be Lorentz invariant. Examples of scalars include the electric charge and rest mass of a particle.
The second type of quantity is the subject of this chapter: a vector. This quantity can be thought of geometrically as an arrow in spacetime. However, we might also wish to choose a particular reference frame and describe the components of this vector with respect to a particular basis. To do this we will need to specify a coordinate system in which to work. Because we are dealing (for now) with flat spacetime, a choice of coordinates made in one part of spacetime will work throughout the whole of spacetime. As we shall see later, this rather convenient property will not work in a curved spacetime, and there our coordinates will generally only apply locally. (In the same way, a local map of New York, printed on a two-dimensional sheet of paper, cannot be extended to the whole Earth because the planet is spherical.)
Example 2.1
We have seen that the basic currency of relativity is the event. Examples of events include the emission of a photon, receiving a photon, hearing a loud noise or being shot by an arrow. Events are witnessed and recorded by observers. The simplest class of events occurs directly at the point in space occupied by the observer carrying a clock. The observer assigns a time, as measured on their clock, to the event. Once we have coordinate frames at our disposal, we can record events that occur at different points in spacetime as well as the intervals that separate them. We record the events in terms of the position on the coordinate grid and the time on the clock at that position. Events can therefore be expressed in the coordinates ^(1){ }^{1} of some frame
which will sometimes be written as x^(mu)=(ct, vec(x))x^{\mu}=(c t, \vec{x}), using the notion of the 3-vector vec(x)\vec{x} with spatial coordinates x^(i)=(x,y,z)x^{i}=(x, y, z), taken from the end of the alphabet. The location of an event in spacetime can be described by a 4 -vector x\boldsymbol{x} considered as an arrow in spacetime (stretching, say, from the origin to the event). The particular coordinates x^(mu)x^{\mu} relating to x\boldsymbol{x} depend on the basis chosen. ^(2){ }^{2}
Note that in this chapter, and from now on unless otherwise indicated, we choose units such that c=1c=1.
A vector isn't just any old collection of components. It is an object that has to transform appropriately under coordinate transformations. ^(3){ }^{3} In flat spacetime, 4 -vectors are made from a timelike part and a spacelike part and are displayed in bold italics, so a position in spacetime is written as x\boldsymbol{x} where x\boldsymbol{x} has components x^(mu)=(t, vec(x))x^{\mu}=(t, \vec{x}). Components for 4 -vectors are given a Greek index, so for example x^(mu)x^{\mu} where mu=0,1,2,3\mu=0,1,2,3. In the jargon, x^(0)x^{0} is the timelike component, x^(1),x^(2)x^{1}, x^{2} and x^(3)x^{3} are the spacelike components. The spacelike components themselves form a 3 -vector, whose components are given a Roman index such as x^(i)x^{i}, where i=1,2,3i=1,2,3.
2.1 Vectors
In this chapter, we are going to consider the role of vectors in special relativity. We can think of a vector in special relativity as an arrow in spacetime. If we have two events at points A\mathcal{A} and B\mathcal{B} in flat spacetime, then we can define a vector ^(4){ }^{4} that points from A\mathcal{A} to B\mathcal{B} by
Defined in this way, a vector lives independently of any coordinate system. The vector points from the event at point A\mathcal{A} to an event at B\mathcal{B}, no matter what time and space coordinates we assign to the events (see Fig. 2.2). In order to express the vector in terms of coordinates, we need to define a set of basis vectors which we shall denote ^(5){ }^{5} by e_(mu)\boldsymbol{e}_{\mu}.
Example 2.2
Old-fashioned 3-vectors in Euclidean three-dimensional space are written as
Note that components are given upstairs indices, while basis vectors are given downstairs indices. As a result, the scalar product (or dot product) is written as
where in the last equality we have used the Einstein summation convention, by which index variables (like mu\mu ) repeated in both the upstairs and downstairs positions are assumed to be summed. ^(3)A{ }^{3} \mathrm{~A} good counterexample is the twocomponent 'shopping vector' that contains the price of fish and the price of bread in each component. If you approach the supermarket checkout with the trolley at 45^(@)45^{\circ} to the vertical, you will soon discover that the prices of your shopping will not transform appropriately. To use the jargon introduced earlier, vectors have to transform covariantly (see the discussion on per 3), and our 'shopping vector' fals page 3), and our shopping vector' fails. t isn't a vector at all, just a couple numbers surrounded by brackets. ^(4){ }^{4} This makes it look like vectors and intervals are very similar, and so they are, at this stage. We'll see, however, that they lose this similarity when we start to look at curved spacetimes.
Fig. 2.2 A vector XX lives free of any coordinate system. We can, however, impose a coordinate system and express a vector in terms of basis vectors e_(mu)\boldsymbol{e}_{\mu} and its components X^(mu)X^{\mu}. ^(5){ }^{5} The mu\mu in e_(mu)\boldsymbol{e}_{\mu} tells us which basis vector we're dealing with, rather than telling us which component of a vector we're talking about. ^(6){ }^{6} We define the symbol delta_(ij)\delta_{i j} such that delta_(ij)=1\delta_{i j}=1 when i=ji=j and delta_(ij)=0\delta_{i j}=0 otherwise. It is known as the Kronecker delta. ^(7){ }^{7} Although the prime is written on the subscript, the components X^(sigma^('))X^{\sigma^{\prime}} and basis vectors e_(sigma^('))\boldsymbol{e}_{\sigma^{\prime}} refer to a different cosis vectors e_(sigma^('))\boldsymbol{e}_{\sigma^{\prime}} refer to a different co-
ordinate system (the primed coordiordinate system (the primed coordi-
nate system) from that of X^(mu)X^{\mu} and e_(mu)e_{\mu}. nate system) from that of X^(mu)X^{\mu} and e_(mu)e_{\mu}.
Putting the prime on the indices, rather Putting the prime on the indices, rather than on the variables themselves, might
seem like an odd choice, but it will turn seem like an odd choice, but it will turn
out to be very useful when we start dealing with more complicated equations.
Fig. 2.3 The unprimed and primed coordinate system, showing just one spatial direction. ^(8){ }^{8} This equation can be written as
Note that all the coordinate transformations we are considering are ones that preserve the origin, so that an event at vec(x)=0\vec{x}=0 and t=0t=0 in frame SS is mapped into vec(x)^(')=0\vec{x}^{\prime}=0 and t^(')=0t^{\prime}=0 in frame S^(')S^{\prime}. ^(9){ }^{9} This is an example of the famous relation for differentials
for a function f(x,y,z)f(x, y, z). ^(10){ }^{10} In this chapter, the only transformation we shall consider is the Lorentz transformation. It will turn out that this rule applies more generally to components of vectors although, as discussed in the next chapter, only to the position vector in special cases.
2.2 Coordinate transformations
Since vectors X\boldsymbol{X} exist independently of bases and coordinates, they can be expressed in different coordinate systems (see Fig. 2.3) via a different set of basis vectors ^(7){ }^{7}
Special relativity is based on the observation that the components of 4 -vectors transform between inertial frames according to the Lorentz transformations
where we represent the Lorentz transformations in component form using Lambda_(nu)^(mu^('))\Lambda_{\nu}^{\mu^{\prime}}, which are functions of the relative velocity v(=beta)v(=\beta) of the frames.
Example 2.3
The Lorentz transformation for the coordinates of an event in a frame SS and a frame S^(')S^{\prime} (moving relative to frame SS at speed beta\beta along the xx-axis) can be rewritten in matrix form as
where Lambda^(mu^('))_(nu)\Lambda^{\mu^{\prime}}{ }_{\nu} is the Lorentz transformation matrix. Here we have again used the Einstein summation convention, and the twice-repeated index which is assumed to be summed is nu\nu. ^(8){ }^{8}
An example of a vector is the infinitesimal translation dx\mathrm{d} \boldsymbol{x} which has components dx^(nu)\mathrm{d} x^{\nu} in frame SS. In frame S^(')S^{\prime}, the components then change to dx^(mu^('))=Lambda^(mu^('))dx^(nu)\mathrm{d} x^{\mu^{\prime}}=\Lambda^{\mu^{\prime}} \mathrm{d} x^{\nu}. Noting how each component resembles a differential of a function x^(mu^('))x^{\mu^{\prime}}, we recall that the ordinary rules of calculus also give us a rule for manipulating differentials that reads ^(9){ }^{9}
Thus, transforming components from an unprimed to a primed frame uses this partial derivative which varies a coordinate in the primed frame with respect to a coordinate in the unprimed frame, keeping other coordinates in the unprimed frame fixed. We say that the components of vectors transform like differentials. ^(10){ }^{10}
A key property of the Lorentz transformation is that it preserves the length of a vector, which is a quantity obtained by taking the scalar
product of a vector with itself. The scalar product is a rule for combining vectors that we write
The object (e_(mu)*e_(nu))\left(\boldsymbol{e}_{\mu} \cdot \boldsymbol{e}_{\nu}\right) is a matrix giving a rule for combining vectors. In flat space, this matrix is defined to be eta_(mu nu)-=e_(mu)*e_(nu)\eta_{\mu \nu} \equiv \boldsymbol{e}_{\mu} \cdot \boldsymbol{e}_{\nu} and written out in full as
The minus sign in eta_(00)\eta_{00} is chosen to fit with our definition of ds^(2)\mathrm{d} s^{2}, so that we have ds^(2)=dx*dx\mathrm{d} s^{2}=\mathrm{d} \boldsymbol{x} \cdot \mathrm{d} \boldsymbol{x} which, when written in terms of components, becomes ds^(2)=eta_(mu nu)dx^(mu)dx^(nu)\mathrm{d} s^{2}=\eta_{\mu \nu} \mathrm{d} x^{\mu} \mathrm{d} x^{\nu}.
We can summarize the key expressions involving the Minkowski tensor as follows:
Just as an interval ds can be timelike, spacelike or null, we can classify a vector X\boldsymbol{X} in terms of its square X^(2)=X*X\boldsymbol{X}^{2}=\boldsymbol{X} \cdot \boldsymbol{X} by saying
Consider a light cone (see Fig. 2.4) based at a point P\mathcal{P}. Timelike vectors starting from P\mathcal{P} can only exist within the forward or backward light cones. Spacelike vectors exist outside of the light cones while null vectors lie on the light cones. Light cones are sometimes called absolute surfaces as they always allow us to separate intervals and vectors in this way.
Example 2.4
The Lorentz transformation preserves the length of a vector, which is therefore a Lorentz invariant. This means that we can write
^(11){ }^{11} Some jargon: The signature of the metric tensor is the number of positive, negative or zero eigenvalues. Here, our metric is diagonal and so the eigenvalues can be read off very simply. The Minkowski metric tensor in eqn 2.15 has eigenvalues -1 , 1,1 and 1 and so its signature can , 1 and so its signature can be written as ( 3,1,03,1,0 ) ( 3 plusses, 1
minus, no zeros) or, more commonly, minus, no zeros) or, more commonly,
as (-,+,+,+)(-,+,+,+) (enumerating the signs as (-,+,+,+)(-,+,+,+) (enumerating the signs
of the eigenvalues). Positive definite metrics (+,+,+,+)(+,+,+,+) are called Riemannian. A Lorentzian metric has a signature of one minus and the rest plusses, such as (-,+,+,+)(-,+,+,+) as in eqn 2.15 , or one plus and the rest minuses, as in (+,-,-,-)(+,-,-,-). These latter metrics are known as pseudo Riemannian.
Fig. 2.4 The anatomy of a light cone, showing timelike, spacelike and null vectors.
and hence the scalar product X*Y\boldsymbol{X} \cdot \boldsymbol{Y} is Lorentz invariant. This will be very useful.
Example 2.5
Vectors exist independently of any coordinate system. Therefore, the object X=\boldsymbol{X}=X^(mu)e_(mu)X^{\mu} e_{\mu} doesn't change as it is transformed. This allows us to work out how the basis vectors e_(mu)e_{\mu} themselves transform. We write the transformation
by analogy with eqn 2.9. In order for the vector X\boldsymbol{X} itself to be coordinate independent, the product of the transformations of the components X^(mu)X^{\mu} and basis vectors e_(mu)e_{\mu} must yield the identity. That is to say that X\boldsymbol{X} can be written equivalently as
^(12){ }^{12} Note that, in these expressions full of components, we are free to reorder the terms for convenience. ^(13){ }^{13} The symbol delta^(nu)_(mu)\delta^{\nu}{ }_{\mu} is another version of the Kronecker delta which is defined by
This particular form of the Kronecker delta, with one index up and the other down, is needed for reasons that will be explored after we have considered tensors in Chapter 4 (see in particular Exercise 4.2). ^(14){ }^{14} You can check this is true using the matrix in eqn 2.10, noting that the sign of the velocity beta=v\beta=v is flipped in the inverse operation which transforms back from the primed frame to the unprimed frame.
and application of eqns 2.9 and 2.25 implies that ^(12){ }^{12}
where the last equality relies on Lambda^(alpha^('))_(mu)Lambda^(nu)_(alpha^('))=delta^(nu)_(mu)\Lambda^{\alpha^{\prime}}{ }_{\mu} \Lambda^{\nu}{ }_{\alpha^{\prime}}=\delta^{\nu}{ }_{\mu}, the identity operation. ^(13){ }^{13} Thus, we identify the inverse of the Lorentz transformation Lambda^(alpha^('))_(mu)\Lambda^{\alpha^{\prime}}{ }_{\mu} which we write
To summarize: the inverse of the matrix Lambda^(alpha^('))_(mu)\Lambda^{\alpha^{\prime}}{ }_{\mu} is the matrix ^(14)Lambda^(mu)_(alpha^(')){ }^{14} \Lambda^{\mu}{ }_{\alpha^{\prime}}. We deduce that the basis vectors transform using the inverse of the Lorentz transformation used for the vector components. The key equations for identifying inverses are
{:(2.29)Lambda^(alpha^('))Lambda_(gamma^('))^(beta)=delta_(gamma^('))^(alpha^('))quad" and "quadLambda_(alpha^('))^(beta)Lambda_(gamma)^(alpha^('))=delta^(beta):}\begin{equation*}
\Lambda^{\alpha^{\prime}} \Lambda_{\gamma^{\prime}}^{\beta}=\delta_{\gamma^{\prime}}^{\alpha^{\prime}} \quad \text { and } \quad \Lambda_{\alpha^{\prime}}^{\beta} \Lambda_{\gamma}^{\alpha^{\prime}}=\delta^{\beta} \tag{2.29}
\end{equation*}
Example 2.6
We saw earlier that the components of vectors, carrying an up index, transform like differentials. The rule for transforming objects with down indices, such as the basis vectors, is that they transform like derivatives
where the derivative involves varying a coordinate in the unprimed frame with respect to a coordinate in the primed frame, keeping other coordinates in the primed frame fixed. Thus, another down-indexed object, the gradient vector del_(mu)phi-=del phi//delx^(mu)\partial_{\mu} \phi \equiv \partial \phi / \partial x^{\mu}, transforms as
The jargon is that a^(mu)a^{\mu} transforms like a contravariant vector and del phi//delx^(mu)-=del_(mu)phi\partial \phi / \partial x^{\mu} \equiv \partial_{\mu} \phi transforms like a covariant vector, ^(15){ }^{15} though we avoid these terms and just note that the component a^(mu)a^{\mu} has its indices 'upstairs' and del_(mu)phi\partial_{\mu} \phi has them 'downstairs' and they then transform accordingly. We can also construct the 4 -vector analogue of vec(grad)^(2)\vec{\nabla}^{2} using ^(16)del^(2)=eta^(mu nu)del_(mu)del_(nu)^{16} \partial^{2}=\eta^{\mu \nu} \partial_{\mu} \partial_{\nu}, and this will turn out to be very useful in the theory of gravitational waves.
In summary, we will insist that an object with indices in a downstairs position like a_(mu)a_{\mu} transforms as
where Lambda^(nu)_(mu^('))-=(delx^(nu)//delx^(mu^(')))\Lambda^{\nu}{ }_{\mu^{\prime}} \equiv\left(\partial x^{\nu} / \partial x^{\mu^{\prime}}\right) is the inverse of the Lorentz transformation matrix Lambda^(mu^('))_(nu)\Lambda^{\mu^{\prime}}{ }_{\nu}.
It is a good moment to summarize some of our key results so far:
A vector X\boldsymbol{X} is an arrow in space. It can be written in components as X=X^(mu)e_(mu)\boldsymbol{X}=X^{\mu} \boldsymbol{e}_{\mu}. The components transform according to X^(mu^('))=X^{\mu^{\prime}}=Lambda^(mu^('))_(nu)^(nu)\Lambda^{\mu^{\prime}}{ }_{\nu}^{\nu}, in the same way as a differential dx^(mu)\mathrm{d} x^{\mu}.
The basis vectors have downstairs components and transform according to e_(mu^('))=Lambda^(nu)_(mu^('))e_(nu)\boldsymbol{e}_{\mu^{\prime}}=\Lambda^{\nu}{ }_{\mu^{\prime}} \boldsymbol{e}_{\nu} (i.e. the inverse transformation), in the same way as a gradient del_(mu)=del//delx^(mu)\partial_{\mu}=\partial / \partial x^{\mu}.
The scalar product X*Y=eta_(mu nu)X^(mu)Y^(nu)\boldsymbol{X} \cdot \boldsymbol{Y}=\eta_{\mu \nu} X^{\mu} Y^{\nu} is Lorentz invariant. ^(15){ }^{15} These unfortunate terms are due to the English mathematician J. J. Sylvester (1814-1897). Both types of vectors transform covariantly, in the sense of 'properly', and we wish to retain this sense of the word 'covariant' rather than using it to simply label one type of object that transforms properly. Thus, we usually specify whether the indices on a particular object are 'upstairs' (like a^(mu)a^{\mu} ) or 'downstairs' (like {:del_(mu)phi)\left.\partial_{\mu} \phi\right) and their transformation properties can then be deduced accordingly. ^(16){ }^{16} We define del^(2)\partial^{2} as the scalar product eta^(mu nu)del_(mu)del_(nu)\eta^{\mu \nu} \partial_{\mu} \partial_{\nu} so that
We will show in Chapter 4 (page 45) that eta^(mu nu)\eta^{\mu \nu} behaves as the same matrix as eta_(mu nu)\eta_{\mu \nu}.
2.3 Examples of vectors
In ordinary Euclidean space we have vectors such as vec(r)\vec{r}, the position vector, which is obviously an arrow in space, but also current density vec(J)\vec{J} and acceleration vec(a)\vec{a} which are also vectors but somehow live in different spaces. For spacetime vectors we have an analogous situation and some commonly used vectors are listed in Table 2.1, all of which transform appropriately under Lorentz transformations.
Table 2.1 Commonly used 4 -vectors. *The position vector transforms correctly under Lorentz transformations, but does not transform correctly under general coordinate transformations (see Chapter 3).
(a) Interval and position: The first one in our list is our old friend the interval ds. We can also define a spacetime position 4 -vector x\boldsymbol{x} which
The position vector x^(mu)x^{\mu} works fine in special relativity, but is the one vector in our list in Table 2.1 which will not upgrade nicely to general relativity. We will still be able to work with infinitesimal displacements, such as dx^(mu)\mathrm{d} x^{\mu}. The other vectors in our list will still be extremely useful in general relativity. ^(17){ }^{17} That is, it does transform properly under Lorentz transformations.
Velocity component trick: from the velocity vector u\boldsymbol{u} with components (u^(0),u^(i))=(dt//dtau,dx^(i)(d)tau)\left(u^{0}, u^{i}\right)=\left(\mathrm{d} t / \mathrm{d} \tau, \mathrm{d} x^{i} \mathrm{~d} \tau\right) we can extract a spatial part by saying
^(19){ }^{19} Remember that massive particles, which we assume we're describing here, have timelike velocity vectors with negative |u|^(2)|\boldsymbol{u}|^{2}, as we find.
Fig. 2.5 The velocities u,v//gamma\boldsymbol{u}, \boldsymbol{v} / \gamma and v_("rel ")\boldsymbol{v}_{\text {rel }} in the reference frame of observer UU.
is the non-infinitesimal version of the same thing. In coordinates, we could write x^(mu)=(t, vec(x))x^{\mu}=(t, \vec{x}). As we have seen before, its invariant (the scalar product of itself with itself) is x*x=-t^(2)+x^(2)+y^(2)+z^(2)=-tau^(2)\boldsymbol{x} \cdot \boldsymbol{x}=-t^{2}+x^{2}+y^{2}+z^{2}=-\tau^{2}, where tau\tau is the proper time.
(b) Velocity: Next, let's try and find the velocity. Its tempting to write this as dx//dt\mathrm{d} \boldsymbol{x} / \mathrm{d} t with components dx^(mu)//dt=(1,d vec(x)//dt)\mathrm{d} x^{\mu} / \mathrm{d} t=(1, \mathrm{~d} \vec{x} / \mathrm{d} t) but this is not a 4 -vector because it does not transform properly. You can see this easily by taking the scalar product of it with itself
where v=|d vec(x)//dt|v=|\mathrm{d} \vec{x} / \mathrm{d} t| is the magnitude of the 3 -velocity. This clearly depends on which frame you are in and is not an invariant. The solution is to differentiate x\boldsymbol{x} not with respect to time tt but with respect to proper time tau\tau. This gives us a velocity that is Lorentz covariant ^(17){ }^{17} defined by
The velocity vector can be thought of as being the tangent to the world line of a particle. Using the equation dt//dtau=gamma\mathrm{d} t / \mathrm{d} \tau=\gamma from the last chapter, we can deduce that the velocity u\boldsymbol{u} has components ^(18){ }^{18}
This latter expression, confirmed in the next example, is used in computations throughout the book.
Example 2.7
Three examples to illustrate 4 -vector velocity:
(i) Define the 4 -vector velocity of an observer by u\boldsymbol{u}. In the observer's rest frame, by definition, the 3-velocity is zero. The observer's time is then the proper time x^(0)=taux^{0}=\tau. As a result, the components of the 4 -velocity in the observer's rest frame ar u^(mu)=(1,0,0,0)u^{\mu}=(1,0,0,0). Since eta_(00)=-1\eta_{00}=-1, we also have u^(2)=eta_(00)u^(0)u^(0)=-1\boldsymbol{u}^{2}=\eta_{00} u^{0} u^{0}=-1, as required.
(ii) Take any 4 -vector X\boldsymbol{X}. A useful result is that the timelike component of X\boldsymbol{X} in the observer's frame is then given by
where u\boldsymbol{u} describes the observer's 4 -velocity. Why is this? The great thing about 4 -vector dot products is if you work them out in one, easy frame, the result holds for all frames. So let's choose the observer's frame in which u^(mu)=(1,0,0,0)u^{\mu}=(1,0,0,0) and X^(mu)=(X_(obs)^(0),X_(obs)^(1),X_(obs)^(2),X_(obs)^(3))X^{\mu}=\left(X_{\mathrm{obs}}^{0}, X_{\mathrm{obs}}^{1}, X_{\mathrm{obs}}^{2}, X_{\mathrm{obs}}^{3}\right) and X*u=-X_(obs)^(0)\boldsymbol{X} \cdot \boldsymbol{u}=-X_{\mathrm{obs}}^{0} as required.
(iii) Two observers U and V have velocity 4 -vectors u\boldsymbol{u} and v\boldsymbol{v}. Let's move into U's frame of reference in which the velocities have components u^(mu)=(1,0)u^{\mu}=(1,0) and v^(mu)=(gamma,gamma vec(v)_("rel "))v^{\mu}=\left(\gamma, \gamma \vec{v}_{\text {rel }}\right), where gamma=(1-v_("rel ")^(2))^(-1//2)\gamma=\left(1-v_{\text {rel }}^{2}\right)^{-1 / 2} is appropriate for the relative 3-velocity vec(v)_("rel ")\vec{v}_{\text {rel }} between the two observers. There is an elegant geometrical construction we can make by looking at v^(mu)//gamma=(1, vec(v)_("rel "))v^{\mu} / \gamma=\left(1, \vec{v}_{\text {rel }}\right) which can be written as
where v_("rel ")\boldsymbol{v}_{\text {rel }}, which has components (0, vec(v)_("rel "))\left(0, \vec{v}_{\text {rel }}\right), lies in the spacelike 3 -space ^(20){ }^{20} of observer U (see Fig. 2.5). This means that u*v_("rel ")=0[:}\boldsymbol{u} \cdot \boldsymbol{v}_{\text {rel }}=0\left[\right. since u^(mu)=(1,0)u^{\mu}=(1,0) and {:v_(rel)^(mu)=(0, vec(v)_(rel))]\left.v_{\mathrm{rel}}^{\mu}=\left(0, \vec{v}_{\mathrm{rel}}\right)\right]. Taking the scalar product of each side of eqn 2.39 with itself we have -1//gamma^(2)=-1 / \gamma^{2}=-1+v_(rel)^(2)-1+v_{\mathrm{rel}}^{2}, or
which is a useful result.
(c) Momentum: From the definition of velocity u\boldsymbol{u}, it's a short step to the momentum p=mu\boldsymbol{p}=m \boldsymbol{u} which then has invariant p*p=-m^(2)\boldsymbol{p} \cdot \boldsymbol{p}=-m^{2} and components p^(mu)=(gamma m,gamma m vec(v))=(E, vec(p))p^{\mu}=(\gamma m, \gamma m \vec{v})=(E, \vec{p}) using E=gamma mE=\gamma m and vec(p)=gamma m vec(v)\vec{p}=\gamma m \vec{v}.
Example 2.8
It's useful to remember that the 3-momentum vec(p)=gamma m vec(v)\vec{p}=\gamma m \vec{v} and energy E=gamma mE=\gamma m are related via
Hence, the 4-momentum p=mu\boldsymbol{p}=m \boldsymbol{u} can also be written as
{:(2.43)p^(mu)=(E,Ev^(x),Ev^(y),Ev^(z)):}\begin{equation*}
p^{\mu}=\left(E, E v^{x}, E v^{y}, E v^{z}\right) \tag{2.43}
\end{equation*}
This is helpful as this expression also applies to massless particles such as the photon (whose velocity is a null vector). We therefore take this latter equation to be true for light, giving us an expression for the photon momentum.
(d) Force: Newton's second law may be written in terms of our new language as f=m((d)u)/((d)tau)\boldsymbol{f}=m \frac{\mathrm{~d} \boldsymbol{u}}{\mathrm{~d} \tau}, or
or equivalently f*u=0\boldsymbol{f} \cdot \boldsymbol{u}=0. That is, the 4 -force is perpendicular to the 4 -velocity.
Example 2.9
The condition f*u=0\boldsymbol{f} \cdot \boldsymbol{u}=0 provides a useful relation if we recall that u^(mu)=(gamma,gamma vec(v))u^{\mu}=(\gamma, \gamma \vec{v}). Writing f^(mu)=(f^(0),( vec(f)))f^{\mu}=\left(f^{0}, \vec{f}\right), the dot product f*u\boldsymbol{f} \cdot \boldsymbol{u} yields
^(20){ }^{20} Here spacelike 3 -space simply means the parts of the space where vectors are written in terms of spacelike components v^(i)v^{i} with i=1,2,3i=1,2,3. Equation 2.39 can be checked by substituting in the components given in the example. ^(21){ }^{21} It is often thought that special relativity cannot treat acceleration because it only deals with inertial frames That is not the case. At each moment of time, an accelerating object can be thought of as in an instantaneous rest frame moving at speed vv, but that speed varies along the trajectory. ^(22){ }^{22} In words: the acceleration as mea sured in the rest frame, d vec(v)//dt\mathrm{d} \vec{v} / \mathrm{d} t, sometimes known as the proper acceleration, is found by evaluating the invariant a^(2)\boldsymbol{a}^{2}, which gives precisely the square of this proper acceleration. ^(23){ }^{23} The third scalar product in eqn 2.55 gives us -a^(0)a^(0)+a^(1)a^(1)=g^(2)-a^{0} a^{0}+a^{1} a^{1}=g^{2} while the second one gives us a^(1)=(u^(0)//u^(1))a^(0)a^{1}=\left(u^{0} / u^{1}\right) a^{0} Putting these together gives (a^(0))^(2)[-1+\left(a^{0}\right)^{2}[-1+{:(u^(0)//u^(1))^(2)]=g^(2)\left.\left(u^{0} / u^{1}\right)^{2}\right]=g^{2}. Rearranging gives
(using -u^(0)u^(0)+u^(1)u^(1)=-1-u^{0} u^{0}+u^{1} u^{1}=-1 in the final step). The equation a^(1)=gu^(0)a^{1}=g u^{0} is produced similarly. The final hyperbolic solution comes from differentiating eqn 2.56 with respect to tau\tau giving d^(2)u^(0)//dtau^(2)=g^(2)u^(0)\mathrm{d}^{2} u^{0} / \mathrm{d} \tau^{2}=g^{2} u^{0} and d^(2)u^(1)//dtau^(2)=\mathrm{d}^{2} u^{1} / \mathrm{d} \tau^{2}=g^(2)u^(1)g^{2} u^{1}, and then choosing solutions so that at proper time tau=0\tau=0 we have t=0t=0 and xx is non-zero.
Fig. 2.6 The accelerated world line is a hyperbola.
and since dt//dtau=gamma\mathrm{d} t / \mathrm{d} \tau=\gamma, this simplifies to
This result is consistent with the power dissipated being given by dE//dt=dp^(0)//dt=\mathrm{d} E / \mathrm{d} t=\mathrm{d} p^{0} / \mathrm{d} t=vec(F)* vec(v)\vec{F} \cdot \vec{v}, familiar from classical mechanics.
Using f=m((d)u)/((d)tau)\boldsymbol{f}=m \frac{\mathrm{~d} \boldsymbol{u}}{\mathrm{~d} \tau}, we can express Newton's first law in terms of the velocity 4 -vector and the proper time as
{:(2.51)(du)/((d)tau)=0","quad" or in component form "quad(du^(mu))/(dtau)=0:}\begin{equation*}
\frac{\mathrm{d} \boldsymbol{u}}{\mathrm{~d} \tau}=0, \quad \text { or in component form } \quad \frac{\mathrm{d} u^{\mu}}{\mathrm{d} \tau}=0 \tag{2.51}
\end{equation*}
(e) The acceleration ^(21){ }^{21} is given by a=(du)/(dtau)\boldsymbol{a}=\frac{\mathrm{d} \boldsymbol{u}}{\mathrm{d} \tau}, and has components a^(mu)=a^{\mu}=(a^(0),( vec(a)))\left(a^{0}, \vec{a}\right). From eqn 2.45 , we have
which implies a^(0)=0a^{0}=0 in the rest frame of the observer [where u^(mu)=u^{\mu}=(1,0,0,0)](1,0,0,0)]. This means that, in the observer's instantaneous rest frame, ^(22){ }^{22}
A body is subjected to uniform acceleration gg in its instantaneous rest frame and gg is applied along x^(1)x^{1}. In the instantaneous rest frame we have equations of motion
The solutions to these equations are hyperbolic sines and cosines
{:(2.57)t=(1)/(g)sinh g tau","quad x=(1)/(g)cosh g tau:}\begin{equation*}
t=\frac{1}{g} \sinh g \tau, \quad x=\frac{1}{g} \cosh g \tau \tag{2.57}
\end{equation*}
We conclude that the accelerated world line is the hyperbola x^(2)-t^(2)=g^(-2)x^{2}-t^{2}=g^{-2} (see Fig. 2.6). The velocity along the world line is
{:(2.58)u^(0)=(dt)/((d)tau)=cosh g tau","quadu^(1)=(dx^(1))/((d)tau)=sinh g tau:}\begin{equation*}
u^{0}=\frac{\mathrm{d} t}{\mathrm{~d} \tau}=\cosh g \tau, \quad u^{1}=\frac{\mathrm{d} x^{1}}{\mathrm{~d} \tau}=\sinh g \tau \tag{2.58}
\end{equation*}
which satisfies u*u=-u^(0)u^(0)+u^(1)u^(1)=-1\boldsymbol{u} \cdot \boldsymbol{u}=-u^{0} u^{0}+u^{1} u^{1}=-1. The particle's 3 -velocity is
{:(2.59)v=(dx^(1))/((d)t)=(dx^(1)//dtau)/((d)t//dtau)=tanh g tau:}\begin{equation*}
v=\frac{\mathrm{d} x^{1}}{\mathrm{~d} t}=\frac{\mathrm{d} x^{1} / \mathrm{d} \tau}{\mathrm{~d} t / \mathrm{d} \tau}=\tanh g \tau \tag{2.59}
\end{equation*}
This never exceeds v=1v=1, but approaches it for tau=+-oo\tau= \pm \infty. The 4 -acceleration is
{:(2.60)a^(0)=g sinh g tau","quada^(1)=g cosh g tau:}\begin{equation*}
a^{0}=g \sinh g \tau, \quad a^{1}=g \cosh g \tau \tag{2.60}
\end{equation*}
and the magnitude is |a|=sqrt(-(a^(0))^(2)+(a^(1))^(2))=g|\boldsymbol{a}|=\sqrt{-\left(a^{0}\right)^{2}+\left(a^{1}\right)^{2}}=g. The 4 -force required for this acceleration is given by f^(mu)=ma^(mu)f^{\mu}=m a^{\mu}.
(f) Particle current: This example refers to a cloud of particles. The particle current J=n_(0)u\boldsymbol{J}=n_{0} \boldsymbol{u}, where n_(0)n_{0} is the number of density of particles in their rest frame and u\boldsymbol{u} is their velocity. In the rest-frame of the particles, we can write J^(mu)=n_(0)(1,0,0,0)J^{\mu}=n_{0}(1,0,0,0); in a general frame J^(mu)=gamman_(0)(1, vec(u))J^{\mu}=\gamma n_{0}(1, \vec{u}). The timelike component gives the number density nn [and in a general frame the density increases according to n=gamman_(0)n=\gamma n_{0} because of Lorentz contraction (Fig. 2.7)]. The spacelike components give the flux of particles along that direction: e.g. J^(x)=gamman_(0)u^(x)=nu^(x)J^{x}=\gamma n_{0} u^{x}=n u^{x} is the number of particles crossing the yzy z plane in unit time.
2.4 Principle of least action
In the last section of this chapter, we shall show how some deep ideas in classical mechanics can be adapted for use in relativity. In contrast to using Newton's laws to work out now a system behaves, an alternative procedure was developed by the mathematician Joseph-Louis Lagrange to derive equations of motion from a variational principle. Thus, rather than starting with Newton's laws, we start with Hamilton's principle of least action, which we state below. The idea is to suppose that every mechanical system is characterized by a function called the Lagrangian, written ^(24)L(q_(1),q_(2),dots,q_(n),q^(˙)_(1),q^(˙)_(2),dots,q^(˙)_(n),t){ }^{24} L\left(q_{1}, q_{2}, \ldots, q_{n}, \dot{q}_{1}, \dot{q}_{2}, \ldots, \dot{q}_{n}, t\right), which is a function of the positions q_(i)q_{i} of each of the nn particles in the system, their velocities v=dq_(i)//dt=q^(˙)_(i)v=\mathrm{d} q_{i} / \mathrm{d} t=\dot{q}_{i} and the time tt. For simplicity we'll consider a single particle moving in one dimension, whose Lagrangian is then written L(q,q^(˙),t)L(q, \dot{q}, t). Consider the trajectory of this particle as it travels from point A\mathcal{A} with coordinate q(t_(1))q\left(t_{1}\right) at time t_(1)t_{1} to a point B\mathcal{B} with coordinate q(t_(2))q\left(t_{2}\right) at time t_(2)t_{2}. The action for this trajectory is defined to be
{:(2.61)S=int_(t_(1))^(t_(2))dtL(q","q^(˙)","t):}\begin{equation*}
S=\int_{t_{1}}^{t_{2}} \mathrm{~d} t L(q, \dot{q}, t) \tag{2.61}
\end{equation*}
That is, we evaluate the Lagrangian at each point along the trajectory and add these up in the integral. What is the Lagrangian? In classical mechanics, Lagrange showed that it takes the form
{:(2.62)L=(" Kinetic energy ")-(" Potential energy ")",":}\begin{equation*}
L=(\text { Kinetic energy })-(\text { Potential energy }), \tag{2.62}
\end{equation*}
and we shall work this out in some particular cases in Example 2.11.
Hamilton's principle of least action says that the action, when describing the motion that actually takes place subject to the laws of physics, takes an extremal value (i.e. a maximum, stationary or minimum value). That is, if we find the path q(t)q(t) that extremizes the action, we have found the path the particle takes in travelling from A\mathcal{A} to B\mathcal{B}. There are many possible paths, and some of these are drawn in Fig. 2.8. Finding the path that extremizes the action is a simple problem in the calculus of variations (see Section 1.4) and from that we can immediately conclude that the equations of motion governing the motion of any particle in the Universe, is given by plugging the Lagrangian into the Euler-Lagrange equation.
Fig. 2.7 Length contraction increases the density of particles in a box owing to the shortening of the box length along the direction of travel.
Joseph-Louis Lagrange (1736-1813) ^(24){ }^{24} We have assumed one-dimensional motion for each of the particles in this expression. In three dimensions, we write L( vec(q)_(1), vec(q)_(2),dots, vec(q)_(n), vec(q)^(˙)_(1), vec(q)^(˙)_(2),dots, vec(q)^(˙)_(n),t)L\left(\vec{q}_{1}, \vec{q}_{2}, \ldots, \vec{q}_{n}, \dot{\vec{q}}_{1}, \dot{\vec{q}}_{2}, \ldots, \dot{\vec{q}}_{n}, t\right).
Fig. 2.8 Different trajectories with delta q(t_(1))=delta q(t_(2))=0\delta q\left(t_{1}\right)=\delta q\left(t_{2}\right)=0.
If the motion is in several dimensions, and consequently described by several coordinates x^(i)=(x^(1),x^(2),x^(3)dots)x^{i}=\left(x^{1}, x^{2}, x^{3} \ldots\right), we have an Euler-Lagrange equation for each coordinate, giving a set of equations of motion which are known as the Euler-Lagrange equations
A solution can then be found by solving the entire set of equations. For our purposes, the fact that the Euler-Lagrange equations pick out the extremal values of the action SS also makes them useful in a geometrical context. We now turn to some very simple examples and applications of this concept.
Example 2.11
A free, non-relativistic particle has kinetic energy (1)/(2)mx^(˙)^(2)\frac{1}{2} m \dot{x}^{2} and so has Lagrangian L=(1)/(2)mx^(˙)^(2)L=\frac{1}{2} m \dot{x}^{2}. The particle obeys the Euler-Lagrange equations and here this reduces to
and we obtain x^(¨)=0\ddot{x}=0, implying that x^(˙)\dot{x} is a constant of the motion.
Repeating this in three dimensions means the Lagrangian is L=(1)/(2)m(x^(˙)^(2)+:}L=\frac{1}{2} m\left(\dot{x}^{2}+\right.y^(˙)^(2)+z^(˙)^(2)\dot{y}^{2}+\dot{z}^{2} ) and we find that x^(¨)=y^(¨)=z^(¨)=0\ddot{x}=\ddot{y}=\ddot{z}=0, implying that x^(˙),y^(˙)\dot{x}, \dot{y} and z^(˙)\dot{z} are each individually constant, and that vec(v)=(x^(˙),y^(˙),z^(˙))\vec{v}=(\dot{x}, \dot{y}, \dot{z}) is a constant vector. We each individually constant, and that v=(x,y,z)v=(x, y, z) is a constant vector. We
conclude that, in an inertial frame, free motion takes place with a velocity that is constant in magnitude and direction. This is known as the law of inertia.
For a particle in a potential V(x)V(x) we have L=(1)/(2)mx^(˙)^(2)-V(x)L=\frac{1}{2} m \dot{x}^{2}-V(x), and we obtain an equation of motion
{:(2.66)mx^(¨)=-(del V)/(del x):}\begin{equation*}
m \ddot{x}=-\frac{\partial V}{\partial x} \tag{2.66}
\end{equation*}
which expresses Newton's second law.
For a simple harmonic oscillator, V(x)=(1)/(2)kx^(2)V(x)=\frac{1}{2} k x^{2}, and so L=(1)/(2)mv^(2)-(1)/(2)kx^(2)L=\frac{1}{2} m v^{2}-\frac{1}{2} k x^{2}, giving an equation of motion mx^(¨)=-kxm \ddot{x}=-k x.
For Newtonian gravitation
{:(2.67)L=(1)/(2)m((x^(˙))+y^(˙)^(2)+z^(˙)^(2))+(GMm)/(|( vec(r))|):}\begin{equation*}
L=\frac{1}{2} m\left(\dot{x}+\dot{y}^{2}+\dot{z}^{2}\right)+\frac{G M m}{|\vec{r}|} \tag{2.67}
\end{equation*}
where r^(2)=x^(2)+y^(2)+z^(2)r^{2}=x^{2}+y^{2}+z^{2}. We derive equations of motion
{:(2.68)x^(¨)=-GM(x)/(r^(3))","quady^(¨)=-GM(y)/(r^(3))","quadz^(¨)=-GM(z)/(r^(3)).:}\begin{equation*}
\ddot{x}=-G M \frac{x}{r^{3}}, \quad \ddot{y}=-G M \frac{y}{r^{3}}, \quad \ddot{z}=-G M \frac{z}{r^{3}} . \tag{2.68}
\end{equation*}
Hamilton's principle of least action is closely related to Fermat's principle of least time, the idea that light chooses a route that minimizes the travel time. This gives us a way of thinking about the Lagrangian for a relativistic particle in flat spacetime. We write the action as
This is mass mm (which is an energy mc^(2)m c^{2} if you put the factors of cc back in) multiplied by the path length in time. There is also a minus sign which expresses that when we minimize SS we maximize intdtau\int \mathrm{d} \tau (and we have already argued from the twin paradox that the straight-line path is the longest, not shortest path). As we shall see, these choices give us the correct dynamics. First note that since dtau=dt//gamma\mathrm{d} \tau=\mathrm{d} t / \gamma, eqn 2.69 gives us a very simple Lagrangian ^(25){ }^{25}
We can show that the Lagrangian we have identified in eqn 2.70 makes sense by taking a non-relativistic limit for small velocities
{:(2.72)L=-(m)/( gamma)=-m[1-(1)/(2)v^(2)+dots]=-m+(1)/(2)mv^(2)-dots:}\begin{equation*}
L=-\frac{m}{\gamma}=-m\left[1-\frac{1}{2} v^{2}+\ldots\right]=-m+\frac{1}{2} m v^{2}-\ldots \tag{2.72}
\end{equation*}
Ignoring the constant -m-m, which vanishes as soon as we differentiate the Lagrangian, we have the Lagrangian L=(1)/(2)mv^(2)L=\frac{1}{2} m v^{2} for a free particle in Newtonian mechanics. Having passed this test, we realize that eqn 2.70 is a fully working Lagrangian for free particles in special relativity: L=L=-m//gamma=-m(1-v^(2))^((1)/(2))-m / \gamma=-m\left(1-v^{2}\right)^{\frac{1}{2}}. To analyse the mechanics of a system we need to deal with the forces that act on particles owing to the potential VV that they feel. In Newtonian physics, we would simple write L=(1)/(2)mv^(2)-VL=\frac{1}{2} m v^{2}-V. However, the role of the potential energy in relativity turns out to be more subtle and depends on which potential we are considering.
Example 2.13
Inserting a potential Phi\Phi into the Lagrangian in eqn 2.72 would give L=-m+(1)/(2)mv^(2)-L=-m+\frac{1}{2} m v^{2}-m Phi+cdotsm \Phi+\cdots, so that the action S=int Ldt=-m intsqrt(-ds^(2))S=\int L \mathrm{~d} t=-m \int \sqrt{-\mathrm{d} s^{2}}, and this would imply
{:(2.73)sqrt(-ds^(2))=(1-(1)/(2)*v^(2)+Phi)dt.:}\begin{equation*}
\sqrt{-\mathrm{d} s^{2}}=\left(1-\frac{1}{2} \cdot v^{2}+\Phi\right) \mathrm{d} t . \tag{2.73}
\end{equation*}
Hence, using v^(2)=(dx//dt)^(2)+(dy//dt)^(2)+(dz//dt)^(2)v^{2}=(\mathrm{d} x / \mathrm{d} t)^{2}+(\mathrm{d} y / \mathrm{d} t)^{2}+(\mathrm{d} z / \mathrm{d} t)^{2} we have (to leading order )^(26))^{26}
This is our first hint that a gravitational potential can change the metric of spacetime. However, this expression has been obtained in a non-relativistic limit so one should exercise some caution. ^(27){ }^{27} ^(25){ }^{25} Hence, S=int Ldt=-m intdtauS=\int L \mathrm{~d} t=-m \int \mathrm{~d} \tau as re-
quired. quired. ^(26){ }^{26} That is, we write
and drop higher order terms proportional to Phi^(2),v^(2)Phi\Phi^{2}, v^{2} \Phi and v^(4)v^{4}. ^(27){ }^{27} The v≪cv \ll c limit used in eqn 2.72 to derive eqn 2.75 means that this expression is only expected to work for timelike intervals close to the time axis, and so one might suspect that the spatial coordinates in eqn 2.75 might need tweaking due to the effects of the potential. That turns out to be the case (see eqn 5.22).
This brings us on to a very important idea. In order to describe gravitation, Einstein did not simply put together a potential to contribute to a Lagrangian, but instead showed how gravitation changes the very properties of spacetime itself. This is done by having matter directly affect the metric and, by extension, the basic rules governing the intervals in spacetime. In general relativity, the action S=-m int(-eta_(mu nu)dx^(mu)dx^(nu))^((1)/(2))S=-m \int\left(-\eta_{\mu \nu} \mathrm{d} x^{\mu} \mathrm{d} x^{\nu}\right)^{\frac{1}{2}} becomes
where the gravitating masses determine the form of the metric g_(mu nu)g_{\mu \nu} (which, as we shall see, can then be different from the flat spacetime metric eta_(mu nu)\eta_{\mu \nu} ).
Chapter summary
Vectors in special relativity are four-dimensional. Their components can be transformed by the Lorentz transformations, but scalar products are invariant.
The Minkowski metric tensor, with components eta_(mu nu)\eta_{\mu \nu} allows us to make scalar products in flat spacetime.
The relativistic action is given in terms of the interval ds\mathrm{d} s.
The velocity vector is tangent to the world line of a particle.
An observer with 4 -velocity u\boldsymbol{u} measures the timelike component of 4 -vector X\boldsymbol{X} to be -X*u-\boldsymbol{X} \cdot \boldsymbol{u}.
Exercises
(2.1) Show that an observer travelling with velocity 4vector u\boldsymbol{u} will deduce that, in their frame, the energy of a particle with 4 -vector p\boldsymbol{p} is E=-p*uE=-\boldsymbol{p} \cdot \boldsymbol{u}.
(2.2) Show that eqn 2.41(u*v=-gamma)2.41(\boldsymbol{u} \cdot \boldsymbol{v}=-\gamma) implies that
and that in the non-relativistic limit this reduces to vec(v)_("rel ")= vec(u)- vec(v)\vec{v}_{\text {rel }}=\vec{u}-\vec{v} or vec(v)_("rel ")= vec(v)- vec(u)\vec{v}_{\text {rel }}=\vec{v}-\vec{u}. Interpret these results physically.
(2.3) The momentum vector can be written in components as
Show also that p*p=eta_(mu nu)p^(mu)p^(nu)=p_(nu)p^(nu)=-E^(2)+p^(2)=-m^(2).(2.80)\boldsymbol{p} \cdot \boldsymbol{p}=\eta_{\mu \nu} p^{\mu} p^{\nu}=p_{\nu} p^{\nu}=-E^{2}+p^{2}=-m^{2} .(2.80)
The gradient operator has components
Prove the following relations:
(i) del^(mu)=(-del//del t, vec(grad))\partial^{\mu}=(-\partial / \partial t, \vec{\nabla});
(ii) del^(2)=del_(mu)del^(mu)=-del^(2)//delt^(2)+ vec(grad)^(2)\partial^{2}=\partial_{\mu} \partial^{\mu}=-\partial^{2} / \partial t^{2}+\vec{\nabla}^{2};
(iii) del_(mu)J^(mu)=-del rho//del t+ vec(grad)* vec(J)\partial_{\mu} J^{\mu}=-\partial \rho / \partial t+\vec{\nabla} \cdot \vec{J}; (iv) a_(mu)u^(mu)=0a_{\mu} u^{\mu}=0.
(2.4) The Galaxy is about 10^(5)10^{5} light years across and the most energetic cosmic rays known have energies of the order of 10^(19)eV10^{19} \mathrm{eV}. How long would it take a proton (rest mass ~~1GeV\approx 1 \mathrm{GeV} ) with this energy to cross the Galaxy as measured in the rest frame of (i) the Galaxy and (ii) the proton?
(2.5) In its rest frame, a particle of mass mm will have 4 -vector p^(mu)=(m,0,0,0)p^{\mu}=(m, 0,0,0). Using the Lorentz transformation on this 4 -vector, find the energy and momentum of a particle in a frame moving so that the particle has speed vv. Check that the original and
transformed 4 -vector components give the same invariant.
(2.6) Use 4 -vectors to show that an electron in free space cannot absorb a single photon.
(2.7) Using the principle of least action, calculate the shape traced out by a hanging string.
(2.8) The generalized momentum in Lagrangian mechanics is vec(p)=del L//del vec(v)\vec{p}=\partial L / \partial \vec{v}. Show that with L=-m//gammaL=-m / \gamma, this yields vec(p)=m gamma vec(v)\vec{p}=m \gamma \vec{v}.
(2.9) The Hamiltonian HH in Lagrangian mechanics is given by H= vec(p)* vec(v)-LH=\vec{p} \cdot \vec{v}-L where vec(p)\vec{p} is the momentum and vec(v)\vec{v} is the velocity. Show that this agrees with E=gamma mc^(2)E=\gamma m c^{2}.
3.1 Coordinates in Euclidean space 36 3.2 Farewell to the position vector
3.3 Non-Euclidean space 40 Chapter summary 41 Exercises
Coordinates
I'll put a girdle round about the earth in forty minutes William Shakespeare A Midsummer Night's Dream (Act II, Scene I)
Do we actually need coordinates? In many cases, we are better off not using them. Take a relationship like the one that expresses Newton's second law
This is a relationship between two vectors (arrows in spacetime) and holds irrespective of the coordinates chosen. Yes, you could choose a frame in which you can write f^(mu)=dp^(mu)//dtauf^{\mu}=\mathrm{d} p^{\mu} / \mathrm{d} \tau, but if you transform into a second frame which is rotating with respect to the first, then eqn 3.1 will emerge with extra terms (centrifugal and Coriolis forces) and will look a lot more complicated, even though the same physics is being described. It's much better, whenever possible, to stay above the fray and focus on a coordinate-free approach in which you only deal with statements expressed in purely geometrical terms.
However, particular physical problems have a very nasty habit of requiring us to dive back down into the murky world of coordinates, rather than staying aloof in our heavenly geometrical realm. Sometimes, like Puck in A Midsummer Night's Dream, we need to put a coordinate girdle around the Earth. There are often occasions when we need coordinates to express the value quantities take in particular frames, or to allow us to exploit a particular symmetry. Therefore, in this chapter, we develop a few ideas about coordinates and, for a start, we will need to define some terms: Euclidean space, Cartesian coordinates, coordinate and non-coordinate bases, non-Euclidean space.
3.1 Coordinates in Euclidean space
Euclidean space is a set of points in nn-dimensions in which the scalar product between two vectors X\boldsymbol{X} and Y\boldsymbol{Y} is ^(1)X*Y=sum_(mu)X^(mu)Y^(mu){ }^{1} \boldsymbol{X} \cdot \boldsymbol{Y}=\sum_{\mu} X^{\mu} Y^{\mu}, so that the length of a vector X\boldsymbol{X} is |X|=sqrt(X*X)=(sum_(mu)X^(mu)X^(mu))^((1)/(2))|\boldsymbol{X}|=\sqrt{\boldsymbol{X} \cdot \boldsymbol{X}}=\left(\sum_{\mu} X^{\mu} X^{\mu}\right)^{\frac{1}{2}} and the angle between X\boldsymbol{X} and Y\boldsymbol{Y} is cos^(-1)(X*Y//|X||Y|)\cos ^{-1}(\boldsymbol{X} \cdot \boldsymbol{Y} /|\boldsymbol{X} \| \boldsymbol{Y}|). In other words, it is the flat space you have been using all your life, equipped with geometric axioms that date back to Euclid's Elements around 300 BC Euclid focussed on the geometry of the plane ( n=2n=2 ), showing that the
sum of the internal angles in a triangle adds up to 180^(@)180^{\circ} and so forth, and so we will also choose n=2n=2 for now.
Euclidean space is most often described using Cartesian coordinates, an innovation of René Descartes in 1637 and so, following his example, we describe any point in the plane using two numbers xx and yy that encode where on the Cartesian plane in Fig. 3.1(a) a particular point happens to lie. Of course, that's not the only way of describing the Euclidean plane. Polar coordinates ^(2){ }^{2} are another option where the same point can be described by a distance rr from the origin and an angle theta\theta, as shown in Fig. 3.1(b).
These two sets of coordinates are related by the familiar equations
{:(3.2)x=r cos theta","quad y=r sin theta","quad r=(x^(2)+y^(2))^((1)/(2))","quad tan theta=y//x:}\begin{equation*}
x=r \cos \theta, \quad y=r \sin \theta, \quad r=\left(x^{2}+y^{2}\right)^{\frac{1}{2}}, \quad \tan \theta=y / x \tag{3.2}
\end{equation*}
To express the components of a vector X=X^(mu)e_(mu)\boldsymbol{X}=X^{\mu} \boldsymbol{e}_{\mu} in terms of the new coordinates we can use the formula for coordinate transformations (eqn 2.11, reserved until now for Lorentz transformations)
Example 3.1
We will consider the transformation between two-dimensional Cartesian coordinates and polar coordinates on the components of the infinitesimal-displacement vector dx=dx^(mu)e_(mu)\mathrm{d} \boldsymbol{x}=\mathrm{d} x^{\mu} \boldsymbol{e}_{\mu} with components dx^(mu)\mathrm{d} x^{\mu}. For the unprimed coordinates we write x^(mu)=x^{\mu}=(x,y)(x, y). For the primed coordinates we write x^(alpha^('))=(r,theta)x^{\alpha^{\prime}}=(r, \theta). The transformations are given by a matrix formed from the partial derivatives computed below
We can write the derivatives in matrix form dx^(alpha^('))=(delx^(alpha^('))//delx^(mu))dx^(mu)\mathrm{d} x^{\alpha^{\prime}}=\left(\partial x^{\alpha^{\prime}} / \partial x^{\mu}\right) \mathrm{d} x^{\mu}, which gives a transformation matrix
and we illustrate the use of this in the following example. ^(2){ }^{2} Polar coordinates are our first example of curvilinear coordinates, which are sets of coordinates where Pythagoras' theorem doesn't hold simply. That is, s^(2)!=r^(2)+theta^(2)s^{2} \neq r^{2}+\theta^{2}.
(a)
(b)
Fig. 3.1 (a) The point (x_(0),y_(0))\left(x_{0}, y_{0}\right) in the Euclidean plane. (b) In polar coordinates, the same point is at (r,theta)(r, \theta).
Fig. 3.2 (a) The coordinate basis e_(r)e_{r} and e_(theta)e_{\theta} has the feature that the basis vectors do not stay a uniform size. In particular, the length of e_(theta)e_{\theta} increases with rr, the distance from the origin. (b) The non-coordinate basis hat(e)_(r)\hat{\mathbf{e}}_{r} and hat(e)_(theta)\hat{\mathbf{e}}_{\theta} remains normalized. ^(3){ }^{3} The word holonomy comes from holo (entire) + nomy (law).
(a)
Example 3.2
Plugging our expressions for polar coordinates into eqn 3.7 gives
The method we have used hasn't produced the usual basis vectors that we might have expected. Elementary treatments of polar coordinates usually give normalized basis vectors hat(e)_(r)\hat{\boldsymbol{e}}_{r} and hat(e)_(theta)\hat{\boldsymbol{e}}_{\theta} given by
The ones we have found (without the hats) have the disquieting feature that they are not all normalized. In fact
{:(3.11)e_(r)*e_(r)=1quad" and "quade_(theta)*e_(theta)=r^(2):}\begin{equation*}
\boldsymbol{e}_{r} \cdot \boldsymbol{e}_{r}=1 \quad \text { and } \quad \boldsymbol{e}_{\theta} \cdot \boldsymbol{e}_{\theta}=r^{2} \tag{3.11}
\end{equation*}
So e_(r)e_{r} looks fine, but e_(theta)\boldsymbol{e}_{\theta} grows the further out you go (see Fig. 3.2). We will show that this seemingly odd property is not a bug but a feature! It's actually exactly what you need. It's helpful for two reasons:
(1) e_(r)\boldsymbol{e}_{r} and e_(theta)\boldsymbol{e}_{\theta} were easy to derive. We just had to plug straight into eqn 2.25(Lambda_(alpha^('))^(mu)e_(mu))2.25\left(\Lambda_{\alpha^{\prime}}^{\mu} \boldsymbol{e}_{\mu}\right) and out they popped.
(2) More importantly, e_(r)\boldsymbol{e}_{r} and e_(theta)\boldsymbol{e}_{\theta} form a coordinate basis (also known as a holonomic basis ^(3){ }^{3} ) whereas hat(e)_(r)\hat{\boldsymbol{e}}_{r} and hat(e)_(theta)\hat{\boldsymbol{e}}_{\theta} form a noncoordinate basis (also known as an anholonomic basis). What does that mean? We will return to the notion of coordinate and non-coordinate bases later, but (loosely) the idea is to take a walk around a closed loop in your space and to see if you return to the starting point in the same geometric state as you started. In a coordinate basis, your basis vectors are truly independent and don't depend on each other. This means that they commute (in technical language, the Lie bracket [e_(r),e_(theta)]=e_(r)e_(theta)-e_(theta)e_(r)=0\left[\boldsymbol{e}_{r}, \boldsymbol{e}_{\theta}\right]=\boldsymbol{e}_{r} \boldsymbol{e}_{\theta}-\boldsymbol{e}_{\theta} \boldsymbol{e}_{r}=0 ) which means that you can make a closed path by travelling one unit along e_(r)\boldsymbol{e}_{r}, one unit along e_(theta)\boldsymbol{e}_{\theta}, then minus one unit along e_(r)\boldsymbol{e}_{r} and minus one unit along e_(theta)\boldsymbol{e}_{\theta} and you will get back to your starting point [see Fig. 3.3(a)]. This doesn't work if you use the normalized vectors (where [ hat(e)_(r), hat(e)_(theta)]!=0\left[\hat{\boldsymbol{e}}_{r}, \hat{\boldsymbol{e}}_{\theta}\right] \neq 0 ) and you don't get back to your starting point [see Fig. 3.3(b)].
Example 3.3
Consider a function f(x^(mu))f\left(x^{\mu}\right) which assigns a number to any spacetime point x^(mu)x^{\mu}. Now consider a path through spacetime x^(mu)(lambda)x^{\mu}(\lambda) where lambda\lambda is a number between 0 and 1 . Then f(x^(mu)(lambda))f\left(x^{\mu}(\lambda)\right) represents a function of that path parameter, giving the value that ff takes for every spacetime point along the path. How does ff change with lambda\lambda ? That is given by the derivative of ff along the path, written as
This expression looks a little like that of a vector X=X^(mu)e_(mu)\boldsymbol{X}=X^{\mu} \boldsymbol{e}_{\mu}, with dx^(mu)//dlambda\mathrm{d} x^{\mu} / \mathrm{d} \lambda playing the role of the components of the vector and del//delx^(mu)\partial / \partial x^{\mu} playing the role of the basis vectors. Consequently, we shall identify del//delx^(mu)\partial / \partial x^{\mu} with e_(mu)e_{\mu}, and so in our example of polar coordinates we would write
{:(3.14)e_(r)=(del)/(del r)quad" and "quade_(theta)=(del)/(del theta):}\begin{equation*}
\boldsymbol{e}_{r}=\frac{\partial}{\partial r} \quad \text { and } \quad e_{\theta}=\frac{\partial}{\partial \theta} \tag{3.14}
\end{equation*}
Using this trick, it is clear that these basis vectors commute ([e_(r),e_(theta)]=0)\left(\left[\boldsymbol{e}_{r}, \boldsymbol{e}_{\theta}\right]=0\right) and serve as a coordinate basis. If we had used the non-coordinate basis
{:(3.15) hat(e)_(r)=(del)/(del r)quad" and "quad hat(e)_(theta)=(1)/(r)(del)/(del theta)",":}\begin{equation*}
\hat{\boldsymbol{e}}_{r}=\frac{\partial}{\partial r} \quad \text { and } \quad \hat{\boldsymbol{e}}_{\theta}=\frac{1}{r} \frac{\partial}{\partial \theta}, \tag{3.15}
\end{equation*}
then we would have found that they do not commute, since
and so [ hat(e)_(r), hat(e)_(theta)]=-(1)/(r^(2))(del)/(del theta)=-( hat(e)_(theta))/(r)!=0\left[\hat{\boldsymbol{e}}_{r}, \hat{\boldsymbol{e}}_{\theta}\right]=-\frac{1}{r^{2}} \frac{\partial}{\partial \theta}=-\frac{\hat{e}_{\theta}}{r} \neq 0.
3.2 Farewell to the position vector
When we first encounter vectors as students, the simplest vector that we usually start with is the position (or displacement) vector x=x^(mu)e_(mu)\boldsymbol{x}=x^{\mu} \boldsymbol{e}_{\mu}. In Chapter 2, we found that the position vector does transform appropriately under Lorentz transformations, making it possible to use it in special relativity. However, from this point onwards we will not be using it in general relativity. The reason is that it does not, in general, transform according to our rule for coordinate transformations ^(4){ }^{4}
and in curved spacetime the coefficients A^(mu^('))_(nu)A^{\mu^{\prime}}{ }_{\nu} will depend on the coordinates x^(nu)x^{\nu}. It is only in the special case ^(5){ }^{5} that A^(mu^('))_(nu)A^{\mu^{\prime}}{ }_{\nu} is independent of x^(nu)x^{\nu} that we can write that delx^(mu^('))//delx^(nu)=A^(mu^('))_(nu)\partial x^{\mu^{\prime}} / \partial x^{\nu}=A^{\mu^{\prime}}{ }_{\nu} and eqn 3.17 will then ↷\curvearrowright This relationship between vectors and derivatives is explored in detail in Chapter 31. We return to non-coordinate bases in Chapter 10. ^(4){ }^{4} We can see this in Example 3.1, where the matrix in eqn 3.6 does not allow us to transform between components x^(mu^('))=(r,theta)x^{\mu^{\prime}}=(r, \theta) and x^(alpha)=(x,y)x^{\alpha}=(x, y), ^(5){ }^{5} In flat spacetime, as assumed in special relativity, this condition holds and the displacement vector then presents no problem. ^(6){ }^{6} A good slogan to bear in mind is that 'coordinates are not vectors'.
Fig. 3.4 A circle of radius rr on the surface of a sphere of radius RR (in a galaxy far, far away).
Fig. 3.5 A spherical triangle is constructed by three great circles. The sum of the internal angles, alpha_(1)+alpha_(2)+alpha_(3)\alpha_{1}+\alpha_{2}+\alpha_{3}, is greater than pi\pi. ^(7){ }^{7} Girard's theorem was originally written down by Thomas Harriot (15601621), who was also the first person to make a drawing of the moon through a telescope, several months before Galileo, and worked out Snell's law of refraction nearly two decades before Snell, though six centuries after Ibn Sahl. Credit for first discovery is not always apportioned fairly!
hold for X^(mu)=x^(mu)X^{\mu}=x^{\mu}. This will also work in the case of linear, homogeneous transformations, such as the Lorentz transformations or spatial rotations. Another way of seeing the same thing is to notice that in a curved spacetime a displacement vector is not very well defined; it may not even live in that space. For example, if we consider only the space describing the Earth's surface, then a displacement vector from New York to Tokyo will be an arrow that ploughs through the interior of the Earth. What is well defined though is a path from New York to Tokyo made up of lots of infinitesimal displacements which all can lie on the Earth's surface.
As a result, we shall now drop the position vector x\boldsymbol{x} from our list of well-behaved vectors that transform appropriately, since the transformation we want to use is the one described by eqn 3.17 . However, we will make still use of the coordinates x^(mu)x^{\mu} describing particular events in spacetime. ^(6){ }^{6} We might now be concerned that not having a displacement vector prevents us from defining a velocity vector, which was previously the derivative of x\boldsymbol{x} with respect to the proper time tau\tau. As we'll see in Chapter 7, this concern is unfounded as the velocity vector can be constructed geometrically from the tangent to the world line of a particle. In any case, as we have been explaining, the vector corresponding to an infinitesimal displacement in spacetime does transform correctly.
3.3 Non-Euclidean space
Non-Euclidean space is any space which is not Euclidean, i.e. not equipped with the Euclidean metric with components delta_(mu nu)\delta_{\mu \nu} (so that ds^(2)=\mathrm{d} s^{2}={:delta_(mu nu)dx^(mu)dx^(nu))\left.\delta_{\mu \nu} \mathrm{d} x^{\mu} \mathrm{d} x^{\nu}\right). An example of a non-Euclidean space is the Minkowski spacetime of special relativity in which
Minkowski space is known as a (3+1)-dimensional space (meaning events are described by three spatial coordinates and one time coordinate).
One of the consequences of a non-Euclidean space is that some of Euclid's famous results don't always hold.
Example 3.4
In two-dimensional Euclidean space, the circumference CC of a circle of radius rr is given by C=2pi rC=2 \pi r and the internal angles in a triangle add up to 180^(@)180^{\circ} (or pi\pi radians). However, these results don't work on the surface of a sphere. The circumference of a circle of radius rr on the surface of a sphere of radius RR (see Fig. 3.4) is given by
{:(3.20)C=2pi r sinc(r)/(R):}\begin{equation*}
C=2 \pi r \operatorname{sinc} \frac{r}{R} \tag{3.20}
\end{equation*}
where sinc x=(sin x)//x\operatorname{sinc} x=(\sin x) / x, so C rarr2pi rC \rightarrow 2 \pi r when r≪Rr \ll R. Moreover, from Girard's theorem, ^(7){ }^{7} the sum of the internal angles sumalpha_(1)\sum \alpha_{1} of a spherical triangle on the surface of a sphere (see Fig. 3.5) is given by
where AA is the area of the triangle. Thus, the sum of the angles is greater than pi\pi, although if A≪R^(2)A \ll R^{2} Euclid's result is good enough. These two results are proved in Exercises 3.3 and 3.4.
Chapter summary
Euclidean space uses a metric delta_(mu nu)\delta_{\mu \nu} and gives us the familiar results from Euclidean geometry. It can be described using a Cartesian coordinate system ( x,y,zx, y, z, etc.), but also by other coordinate systems (e.g. plane polar coordinates in two dimensions).
The basis vectors for another coordinate basis can be derived using a transformation from those from another coordinate basis (such as from Cartesian coordinates). These basis vectors are independent from one another so that they commute (their Lie bracket is zero). A non-coordinate basis does not have this property.
A non-Euclidean space has a non-Euclidean metric, but it can still be flat (i.e. not curved), and an example is the Minkowski space with metric eta_(mu nu)\eta_{\mu \nu}.
Exercises
(3.1) Show that for polar coordinates in two dimensions
(3.3) Using simple geometry, prove eqn 3.20. By defining the curvature K=1//R^(2)K=1 / R^{2} for the sphere, eqn 3.20 becomes C=2pi r sinc(rsqrtK)C=2 \pi r \operatorname{sinc}(r \sqrt{K}). Hence, show that
and hence the curvature of a sphere can be calculated by comparing the circumference to 2pi2 \pi times the radius for circles of ever-decreasing size.
(3.4) To prove Girard's theorem (i.e. to prove eqn 3.21), Fig. 3.6 may be helpful. Three great circles produce a spherical triangle of area AA but they also produce another circular triangle on the other side of the sphere. Without loss of generality, you can take the radius of the sphere to be unity, so the total surface area is then 4pi4 \pi. With two spherical triangles, the remaining area is then 4pi-2A4 \pi-2 A. That remaining area is made up of strips like the two shown shaded in Fig. 3.6. You should be able to argue that each of those strips has area 2alpha_(1)-A2 \alpha_{1}-A. Putting that together, you should then be able to deduce that alpha_(1)+alpha_(2)+alpha_(3)=pi+A\alpha_{1}+\alpha_{2}+\alpha_{3}=\pi+A and hence prove the theorem.
Fig. 3.6 Construction for the proof of Girard's theorem.
(3.5) A transformation to a flat, uniformly rotating frame can be achieved via the transformation
{:[t=t^(')],[x=x^(')cos Omegat^(')-y^(')sin Omegat^(')],[y=x^(')sin Omegat^(')+y^(')cos Omegat^(')],[(3.26)z=z^(')]:}\begin{align*}
t & =t^{\prime} \\
x & =x^{\prime} \cos \Omega t^{\prime}-y^{\prime} \sin \Omega t^{\prime} \\
y & =x^{\prime} \sin \Omega t^{\prime}+y^{\prime} \cos \Omega t^{\prime} \\
z & =z^{\prime} \tag{3.26}
\end{align*}
where Omega\Omega is the angular speed of the rotation. What form does the Minkowski metric line element ds^(2)=\mathrm{d} s^{2}=dx*dx\mathrm{d} \boldsymbol{x} \cdot \mathrm{d} \boldsymbol{x} take in this rotating frame?
Linear slot machines
Thou, silent form, dost tease us out of thought As doth eternity: Cold Pastoral!
John Keats (1795-1821) Ode on a Grecian Urn (1820)
A vector can be thought of as an arrow in spacetime, but when spacetime is curved some odd things start to happen. If you travel due North from one city to another over a curved surface (see Fig. 4.1) then you might think you are following a vector in the space of that curved surface. However, following the vector takes you out of the curved surface and leaves you hovering in mid-air, suspended over your final destination! This simple example demonstrates the fact that the vectors defined at a point in a curved space don't necessarily live in that space. In fact, the vectors defined at a point in a particular space live in what is called the tangent space. For the example of the Earth's surface, the space is the sphere ( S^(2)S^{2}, in the language used by mathematicians) and the tangent space is the two-dimensional (flat) plane (R^(2):}\left(\mathbb{R}^{2}\right., in the language used by mathematicians). Thus, when travelling between two cities on a curved space, the journey is best thought of as a path through the space, not a vector between the end points. Vectors are really things that tell you about the local behaviour at a point (because they exist only in the tangent space [see Fig. 4.2]). For now, it is enough to remember that vectors like X\boldsymbol{X} are independent of coordinates, but can be described in a particular coordinate system using basis vectors e_(mu)\boldsymbol{e}_{\mu} and components X^(mu)X^{\mu} in an expression X=X^(mu)e_(mu)\boldsymbol{X}=X^{\mu} \boldsymbol{e}_{\mu}. A vector has a direction and a magnitude, or length. ^(1){ }^{1}
Vectors are only one of the sorts of objects that we require to produce a geometrical description of Nature. In this chapter, we introduce another object that, in many ways, complements the notion of a vector. It has a rather odd name which comes about because the subject of differential geometry [pioneered by the French mathematician Élie Cartan (1869 1951)] contains the notion of what are called 'differential forms'. These can be of increasing 'degree' pp and are then called pp-forms. Here we only want to consider the simplest such object (p=1)(p=1) which is called a 1-form. Like a vector, a 1 -form tilde(sigma)\tilde{\boldsymbol{\sigma}} exists independently of coordinates. It can be expressed in a particular coordinate system via its components and a set of basis 1-forms omega^(mu)\boldsymbol{\omega}^{\mu}, in an expression tilde(sigma)=sigma_(mu)omega^(mu)\tilde{\boldsymbol{\sigma}}=\sigma_{\mu} \boldsymbol{\omega}^{\mu}. Notice how the positions of the indices in the components and basis are reversed compared to vectors. Notice also our notation: X\boldsymbol{X} is a vector, tilde(sigma)\tilde{\boldsymbol{\sigma}} is a 1 -form. The tilde (the wiggly line above the symbol) signifies the 1-form.
4.1 Dot products and down vectors 44 4.2 Vectors and 1-forms 46 4.3 Transformations 49 4.4 Tensors 50 4.5 Energy-momentum tensor 52 Chapter summary 55
Exercises 55
Fig. 4.1 Oxford and Durham are two cities in the UK, with Durham 337 km (only about 200 miles) due North of Oxford. Travelling due North from Oxford on a straight line leaves you in mid-air, suspended about 9 km above Durham, due to the curvature of the Earth. (Travelling due South from Durham would have the same effect when arriving at Oxford, so the sense of superiority felt by the inhabitants of each city would be the same!) The diagram exaggerates the curvature of the Earth for clarity ^(1){ }^{1} The length of a vector is something about which all observers agree and gives rise to the notion of an invariant equal to X^(2)\boldsymbol{X}^{2}.
Fig. 4.2 A vector X\boldsymbol{X} lives in a special space called the tangent space. Points in spacetime can be described by what is called a manifold MM. In general, this will be curved and therefore a vector cannot live in it, but only in a space which is tangent to it. For some point pp in the manifold, there will be a tangent space which (in the notation of differential geometry which we generally avoid in this book) is denoted by T,MT, M (which in this book) is denoted by T_(p)MT_{p} M (which you can read as 'the tangent space at point pp of the manifold M^(')M^{\prime} ).
Fig. 4.3 A 1-form can be described as a set of planes. The inner product between a vector X\boldsymbol{X} and the 1-form can then be thought of as the number of planes skewered by the vector. ^(2){ }^{2} We saw these in the definition of the 1-form tilde(Y)=Y_(mu)omega^(mu)\tilde{\boldsymbol{Y}}=Y_{\mu} \boldsymbol{\omega}^{\mu} in the last section. The link to 1 -forms will be made shortly.
Why do we need this additional object? The reason is that when we combine vectors and 1 -forms which, as we discuss in this chapter, involves forming an inner product, we have to produce a number (i.e. a scalar), and numbers are invariant with respect to coordinate transformations. Thus, if the components of vectors transform in one particular way due to a change of coordinates then we need the components of the object that they combine with to transform in the opposite way so that the result of their combination is independent of the coordinate transformation. This idea might be already familiar as it appears in quantum mechanics; a vector can be represented by a ket |psi:)|\psi\rangle and it combines with a bra (:phi|\langle\phi| to make a number (:phi∣psi:)\langle\phi \mid \psi\rangle. The kets and the bras live in different spaces. Mathematicians think of vectors living in a vector space and the 1 -forms live in the dual space to that vector space. Thus, the 1-forms can be thought of as objects that map vectors onto real numbers. (In quantum mechanics, bras live in a dual space to the ket space and can be thought of as objects that map kets onto complex numbers.)
If a vector can be thought of as an arrow, what geometric object does a 1 -form resemble? One answer is a set of equally spaced plane surfaces, as shown in Fig. 4.3. The magnitude of a 1 -form corresponds to the spatial frequency of the planes (that is, the reciprocal of the distance between planes). The direction of the 1-form tells us how the planes are arranged. This is most easily seen via the basis 1 -forms, which are planes arranged perpendicular to the axes of the coordinate system.
In order to understand objects like vectors and 1 -forms, we shall examine the various ways that they can be combined to make numbers. What unites the methods of combining these objects is that they can be represented as machines that generate scalars. We call the machines tensors or, more colourfully, linear slot machines, since the notation we employ features slots in which to insert vectors and 1-forms (and, as we shall see, the operations are linear ones, see Section 4.3). We start by returning to a familiar way of combining two vectors to make a number: the dot product.
4.1 Dot products and down vectors
In Chapter 2, we wrote the dot product as the component equation
where eta_(mu nu)\eta_{\mu \nu} are the components of the Minkowski metric and we sum over repeated indices. The result of evaluating a dot product using eqn 4.1 is a scalar, which is to say that the result is the same, no matter which coordinate system we consider. Let's consider some new ways of writing the dot product. We can simplify eqn 4.1 by absorbing the eta_(mu nu)Y^(nu)\eta_{\mu \nu} Y^{\nu} part into a new object which has components with indices in the down position, ^(2){ }^{2} and we shall call these components Y_(mu)Y_{\mu}, so that
Another way of looking at eqn 4.2 is that the components of the metric take the index in the up position and replace it with the index in the down position. We say that the metric eta_(mu nu)\eta_{\mu \nu} lowers the index.
Example 4.1
Let's take a dot product of a basis vector e_(lambda)\boldsymbol{e}_{\lambda} with a vector X=X^(mu)e_(mu)\boldsymbol{X}=X^{\mu} \boldsymbol{e}_{\mu}. We have
This is true for all of the spatial components (i.e. we have X_(1)=X^(1)X_{1}=X^{1} and X_(3)=X^(3)X_{3}=X^{3} ), but we also have that X_(0)=-X^(0)X_{0}=-X^{0}, since eta_(00)=-1\eta_{00}=-1.
We define the inverse of eta_(mu nu)\eta_{\mu \nu} as eta^(mu nu)\eta^{\mu \nu}, which is to say ^(4){ }^{4}
Example 4.2
A simple way to understand the geometry of up and down components is to consider Fig. 4.4, showing a vector X\boldsymbol{X} expressed in a coordinate system in which the basis vectors are not orthogonal. As usual, the vector is written as
From the figure, we see that vectors X^(1)e_(1)X^{1} e_{1} and X^(2)e_(2)X^{2} e_{2} form the usual parallelogram describing the addition of two vectors, with sides of length X^(1)X^{1} and X^(2)X^{2}. We saw from the previous example that X_(1)X_{1} is simply the projection of X\boldsymbol{X} along e_(1)\boldsymbol{e}_{1}, achieved using the dot product X*e_(1)\boldsymbol{X} \cdot \boldsymbol{e}_{1}. So we have
with j=1,2j=1,2 (in the lowered position), as shown in the figure.
The metric has components eta_(mu nu)=e_(mu)*e_(nu)\eta_{\mu \nu}=\boldsymbol{e}_{\mu} \cdot \boldsymbol{e}_{\nu}, which in this coordinate system is not diagonal. We now use the metric to raise an index, and we obtain
which means that eta_(00)=-1,eta_(11)=1\eta_{00}=-1, \eta_{11}=1, eta_(22)=1,eta_(33)=1\eta_{22}=1, \eta_{33}=1, and all other elements are zero. ^(4){ }^{4} This means that
and hence eta^(00)=-1,eta^(11)=1,eta^(22)=1\eta^{00}=-1, \eta^{11}=1, \eta^{22}=1, eta^(33)=1\eta^{33}=1, and all other elements are zero. Thus, eta^(mu nu)\eta^{\mu \nu} and eta_(mu nu)\eta_{\mu \nu} act like the same matrix. This property will not hold for most second-rank tensors (i.e. for objects with two indices, to be defined later in this chapter).
Fig. 4.4 The geometry of the up and down components. ^(5){ }^{5} Despite the route we have taken, it is not the case that 1 -forms owe their existence to the metric or to vectors. In fact, they can exist more generally in a system where a metric is not defined. Although we shall not abandon the metric until later in the book, we turn to the more general properties of 1 -forms in the next section.
Fig. 4.5 The metric tensor can be thought of as a kind of 'slot machine', written as eta(\boldsymbol{\eta}(, ) in mathematical symbols, but here is a mental picture of this object. The machine has two slots into which you have to insert vectors. Once you have inserted them, then turn the handle (meaning evaluate eqn 4.17), and out pops a number which is the output of the machine. ^(6){ }^{6} This corresponds to the procedure of summing over all up and down components in expressions like eta_(mu nu)X^(mu)u^(nu)\eta_{\mu \nu} X^{\mu} u^{\nu}.
The inner product is a linear object, which is to say that, if aa and bb are constants, we have
(:a tilde(sigma),bX:)=ab(: tilde(sigma),X:),\langle a \tilde{\boldsymbol{\sigma}}, b \boldsymbol{X}\rangle=a b\langle\tilde{\boldsymbol{\sigma}}, \boldsymbol{X}\rangle,
and also, if Y\boldsymbol{Y} is another vector and tilde(zeta)\tilde{\boldsymbol{\zeta}} another 1-form, that (: tilde(sigma),(X+Y):)=(: tilde(sigma),X:)+(: tilde(sigma),Y:)\langle\tilde{\boldsymbol{\sigma}},(\boldsymbol{X}+\boldsymbol{Y})\rangle=\langle\tilde{\boldsymbol{\sigma}}, \boldsymbol{X}\rangle+\langle\tilde{\boldsymbol{\sigma}}, \boldsymbol{Y}\rangle, (:( tilde(sigma)+ tilde(zeta)),X:)=(: tilde(sigma),X:)+(: tilde(zeta),X:)\langle(\tilde{\boldsymbol{\sigma}}+\tilde{\boldsymbol{\zeta}}), \boldsymbol{X}\rangle=\langle\tilde{\boldsymbol{\sigma}}, \boldsymbol{X}\rangle+\langle\tilde{\boldsymbol{\zeta}}, \boldsymbol{X}\rangle. ^(7){ }^{7} Note that we are writing that X( tilde(sigma))=\boldsymbol{X}(\tilde{\boldsymbol{\sigma}})=tilde(sigma)(X)\tilde{\boldsymbol{\sigma}}(\boldsymbol{X}), namely that a vector operating on a 1 -form gives the same result as a 1 form operating on a vector. We needn't do that (the mathematics doesn't insist upon it), and for example in quantum mechanics the analogue doesn't hold: (:sigma∣X:)\langle\sigma \mid X\rangle is the complex conjugate of (:X∣sigma:)\langle X \mid \sigma\rangle, and the two are only equal if (:sigma∣X:)\langle\sigma \mid X\rangle is real. In general relativity, the quantities we use are real and we will al ways be able to assume that these give the same result.
The existence of down components implies that, just as we have vectors built from up components and basis vectors, there exist objects whose components are the down components. These objects are the 1 -forms and are written as
We can think of the metric in a different way. We take the metric to be the slot machine eta(\boldsymbol{\eta}(,).Thismachinehastwoslotsintowhichwe) . This machine has two slots into which we can input vectors (see Fig. 4.5). The machine outputs a scalar, which is the dot product of the two vectors we have inserted. So take the metric eta(\boldsymbol{\eta}(,)andfillintheslotswithvectorsX) and fill in the slots with vectors \boldsymbol{X} and Y\boldsymbol{Y} to obtain eta(X,Y)\boldsymbol{\eta}(\boldsymbol{X}, \boldsymbol{Y}). This can be written in components as
just as we had before. The slot machine is linear, which is to say that, if aa and bb are scalars, then the following rules hold:
{:[eta(aX","bY)=ab eta(X","Y)],[(4.17)eta(X+Y","Z)=eta(X","Z)+eta(Y","Z)]:}\begin{align*}
\boldsymbol{\eta}(a \boldsymbol{X}, b \boldsymbol{Y}) & =a b \boldsymbol{\eta}(\boldsymbol{X}, \boldsymbol{Y}) \\
\boldsymbol{\eta}(\boldsymbol{X}+\boldsymbol{Y}, \boldsymbol{Z}) & =\boldsymbol{\eta}(\boldsymbol{X}, \boldsymbol{Z})+\boldsymbol{\eta}(\boldsymbol{Y}, \boldsymbol{Z}) \tag{4.17}
\end{align*}
We call the metric slot machine eta(\boldsymbol{\eta}(,)a(0,2)) a (0,2) tensor. The notation (m,n)(m, n) gives the valence of a tensor: how many indices the components have in the up ( mm ) and down ( nn ) positions. Since the components of the metric tensor eta(\boldsymbol{\eta}(,)areeta_(mu nu)) are \eta_{\mu \nu}, we have two down indices and so m=0,n=2m=0, n=2.
Next we identify vectors as valence (1,0)(1,0) tensors and 1-forms, such as tilde(sigma)=sigma_(mu)omega^(mu)\tilde{\boldsymbol{\sigma}}=\sigma_{\mu} \boldsymbol{\omega}^{\mu}, as (0,1)(0,1) tensors. In filling the slots to make a scalar, the sum of valences of all objects involved must make mm and nn equal. ^(6){ }^{6} So inputting two (1,0)(1,0) vectors into a (0,2)(0,2) tensor gives (1,0)+(1,0)+(0,2)=(1,0)+(1,0)+(0,2)=(2,2)(2,2), so that m=nm=n, and this then yields a scalar.
What does the slot machine interpretation imply for vectors and 1forms? A vector, taken as a (1,0)(1,0) tensor, has a slot that can be filled with a (0,1)(0,1) tensor to make a number. We rewrite the vector to show its slot as X()\boldsymbol{X}(). Insert a 1 -form into the slot of a vector X( tilde(sigma))\boldsymbol{X}(\tilde{\boldsymbol{\sigma}}). This is equivalent to inserting a vector into the slot of a 1 -form tilde(sigma)(X)\tilde{\boldsymbol{\sigma}}(\boldsymbol{X}). To put things on an equal footing we write this as a linear operation known as an inner product (known sometimes as a contraction) using angle brackets as follows: ^(7){ }^{7}
To compute this, we need a rule for the inner product of the basis 1 forms and basis vectors (:omega^(nu),e_(mu):)\left\langle\boldsymbol{\omega}^{\nu}, \boldsymbol{e}_{\mu}\right\rangle. This is perhaps the most important rule for manipulating tensors and is given by
As promised at the start of this chapter, we see that the components of vectors and 1 -forms are combined to make a scalar.
Example 4.3
Having basis vectors and 1 -forms available allows us a simple method to extract components. An up component of a vector can be extracted by feeding a basis 1 -form omega^(mu)\boldsymbol{\omega}^{\mu} into the vector's slot
We introduced the 1 -form geometrically as a set of planes and the vector as an arrow. The inner product also has a geometrical interpretation: we think of the vector arrow piercing the 1-form planes, as described in the next example.
Example 4.4
In 1924, Louis Victor Pierre Raymond, 7th duc de Broglie, proposed that all particles have wave-like properties. A particle's momentum p\boldsymbol{p} is related to its wavevector k\boldsymbol{k} via p=ℏk\boldsymbol{p}=\hbar \boldsymbol{k}. Here, the magnitude of the wavevector is related to the particle's wavelength lambda\lambda via |k|=2pi//lambda|\boldsymbol{k}|=2 \pi / \lambda. The amplitude psi\psi of a wave is written as a complex exponential with a phase phi\phi
We can describe the quantum wave/particle by its momentum vector. If we want to know the phase difference Delta phi\Delta \phi between the wave at two positions x_(1)\boldsymbol{x}_{1} and x_(2)\boldsymbol{x}_{2}, separated by a vector x=x_(2)-x_(1)\boldsymbol{x}=\boldsymbol{x}_{2}-\boldsymbol{x}_{1} we can evaluate Delta phi=k*x\Delta \phi=\boldsymbol{k} \cdot \boldsymbol{x}, that is, the dot product of k\boldsymbol{k} and the vector linking the two points x\boldsymbol{x}. This works elegantly in Minkowski space. The 4 -vector k\boldsymbol{k} is related to p\boldsymbol{p} by p=ℏk\boldsymbol{p}=\hbar \boldsymbol{k}, where p=(E, vec(p))\boldsymbol{p}=(E, \vec{p}) and k=(omega,k)\boldsymbol{k}=(\omega, k), and now the phase Delta phi=k*x= vec(k)* vec(x)-omega t\Delta \phi=\boldsymbol{k} \cdot \boldsymbol{x}=\vec{k} \cdot \vec{x}-\omega t. ^(8){ }^{8} Recap: We previously defined a momentum vector pp with components p^(mu)=(E,p^(x),p^(y),p^(z))p^{\mu}=\left(E, p^{x}, p^{y}, p^{z}\right) and a velocity vector u\boldsymbol{u} with components u^(mu)=u^{\mu}=gamma(1,v^(1),v^(2),v^(3))\gamma\left(1, v^{1}, v^{2}, v^{3}\right). The Minkowski tensor can be used to produce down versions p_(mu)=(-E,p^(x),p^(y),p^(z))p_{\mu}=\left(-E, p^{x}, p^{y}, p^{z}\right) and u_(mu)=u_{\mu}=gamma(-1,v^(1),v^(2),v^(3))\gamma\left(-1, v^{1}, v^{2}, v^{3}\right). ^(9){ }^{9} See eqn 2.38[X_(0)^("obs ")=-X*u]2.38\left[X_{0}^{\text {obs }}=-\boldsymbol{X} \cdot \boldsymbol{u}\right] or using this chapter's ideas, X_(0)^("obs ")=X_{0}^{\text {obs }}=-(: tilde(X),u:)-=- tilde(X)(u)-\langle\tilde{\boldsymbol{X}}, \boldsymbol{u}\rangle \equiv-\tilde{\boldsymbol{X}}(\boldsymbol{u}). ^(10){ }^{10} Recall that in the particle's rest frame, we have components u^(mu)=u^{\mu}= (1, 0, 0, 0), so we can write u=u^(mu)e_(mu)=\boldsymbol{u}=u^{\mu} \boldsymbol{e}_{\mu}=e_(0)e_{0}. ^(11){ }^{11} Once again, we can use eqn 2.38 but this time with JJ as a 4 -vector.
Note that in these examples, we could turn things around. For example, we could treat tilde(u)\tilde{\boldsymbol{u}} as a 1 -form and have J\boldsymbol{J} as a vector and then write the final answe as bar(u)(J)=-n\overline{\boldsymbol{u}}(\boldsymbol{J})=-n. In component form eqns 4.27 and 4.28 can be written as p_(mu)u^(mu)=-Ep_{\mu} u^{\mu}=-E and J_(mu)u^(mu)=-nJ_{\mu} u^{\mu}=-n. ^(12){ }^{12} This should be unsurprising since we have already done this using components in eqn 4.2, using the metric tensor to convert an object with up-indices into one with down-indices. Here though, we are doing it entirely geometrically, without worrying about the components.
There is another way. Instead of a momentum vector, we can imagine equally spaced, parallel surfaces separated by a distance proportional to the wavelength of the wave. We'll call this set of surfaces the wave's 1 -form tilde(k)\tilde{\boldsymbol{k}}. They are in fact the surfaces of constant phase in the wave. Now if we want to know the phase difference between two points we simply evaluate the inner product (: tilde(k),x:)\langle\tilde{\boldsymbol{k}}, \boldsymbol{x}\rangle which we can think of as a machine that counts the number of the surfaces of tilde(k)\tilde{\boldsymbol{k}} that the vector x\boldsymbol{x} pierces (see Fig. 4.3). We have
{:(4.25)Delta phi=(: tilde(k)","x:)=" (number of surfaces pierced). ":}\begin{equation*}
\Delta \phi=\langle\tilde{\boldsymbol{k}}, \boldsymbol{x}\rangle=\text { (number of surfaces pierced). } \tag{4.25}
\end{equation*}
See from Fig. 4.3 how a vector appears to pierce some number of the 1-form's planes: this number is equal to the inner product. We input a 1-form into the inner product's first slot, and a vector into the second. The inner product slot machine outputs a number telling us how many 1-form planes are pierced by the vector or
{:(4.26)(: tilde(sigma)","X:)=((" Number of planes of the 1-form "( tilde(sigma)))/(" pierced by the vector "X)):}\begin{equation*}
\langle\tilde{\boldsymbol{\sigma}}, \boldsymbol{X}\rangle=\binom{\text { Number of planes of the 1-form } \tilde{\boldsymbol{\sigma}}}{\text { pierced by the vector } \boldsymbol{X}} \tag{4.26}
\end{equation*}
With 1-forms as part of our machinery, a natural question is what physical quantities they represent. We examine this in the next example.
Example 4.5
The momentum of a particle ^(8){ }^{8} can be represented by a 1-form, whose components are given from the Lagrangian by p_(mu)=del L//delx^(˙)^(mu)p_{\mu}=\partial L / \partial \dot{x}^{\mu}. So a particle has momentum 1-form tilde(p)()\tilde{\boldsymbol{p}}(). Insert the velocity 4 -vector u\boldsymbol{u} into its slot and we output a number. This quantity is ^(9){ }^{9} minus the energy -E-E of the particle, as measured by an observer O_(u)O_{\boldsymbol{u}} with velocity vector u\boldsymbol{u} tangent to their world line. That is
{:(4.27) tilde(p)(u)=-E=-((" Energy of particle ")/(" measured by "O_(u))).:}\begin{equation*}
\tilde{\boldsymbol{p}}(\boldsymbol{u})=-E=-\binom{\text { Energy of particle }}{\text { measured by } O_{u}} . \tag{4.27}
\end{equation*}
We can test this equation in the particle's rest frame, ^(10){ }^{10} in which O_(u)O_{u} has u=e_(0)\boldsymbol{u}=\boldsymbol{e}_{0}, which means E=-p_(mu)(:omega^(mu),e_(0):)=-p_(0)E=-p_{\mu}\left\langle\boldsymbol{\omega}^{\mu}, \boldsymbol{e}_{0}\right\rangle=-p_{0}. Using the Minkowski metric, we have -p_(0)=-p_{0}=p^(0)p^{0}, which is indeed the particle's energy.
An observer with velocity u\boldsymbol{u} passes through a cloud of dust carrying a small, permeable box of a known spatial volume (known as a 3 -volume). The observer makes measurements by counting the number of particles in the box. We define the particle current 1 -form bar(J)()\overline{\boldsymbol{J}}(). Insert ^(11){ }^{11} the velocity vector of the observer u\boldsymbol{u} and output (minus) the number density of particles -n-n measured by the observer with velocity u\boldsymbol{u}. That is
{:(4.28) tilde(J)(u)=-n=-((" number density of particles ")/(" measured by "O_(u))).:}\begin{equation*}
\tilde{\boldsymbol{J}}(\boldsymbol{u})=-n=-\binom{\text { number density of particles }}{\text { measured by } O_{u}} . \tag{4.28}
\end{equation*}
At the start of the chapter, we described vectors as living in a tangent space. 1-forms live in a different space, known as a dual space. Equation 4.20((:omega^(nu),e_(mu):)=delta^(nu)_(mu))4.20\left(\left\langle\boldsymbol{\omega}^{\nu}, \boldsymbol{e}_{\mu}\right\rangle=\delta^{\nu}{ }_{\mu}\right) gives the relationship between these spaces. An interesting question is whether it is possible to map objects between these two spaces, such that one could take a vector and then find an equivalent 1 -form. Such a mapping is carried out using the metric tensor. ^(12){ }^{12} Notice that the inner product (: tilde(sigma),X:)=sigma_(mu)X^(nu)\langle\tilde{\boldsymbol{\sigma}}, \boldsymbol{X}\rangle=\sigma_{\mu} X^{\nu} is just as if we
had taken the dot product of two vectors X=X^(mu)e_(mu)\boldsymbol{X}=X^{\mu} \boldsymbol{e}_{\mu} and sigma=sigma^(nu)e_(nu)\boldsymbol{\sigma}=\sigma^{\nu} \boldsymbol{e}_{\nu} (i.e. the components of the 1-form with the index raised). In fact, we have
This allows us to read off what happens if we fill in just one slot in the metric eta(sigma\boldsymbol{\eta}(\boldsymbol{\sigma},).Since,upondoingthis,westillhaveoneslotlefttoinput) . Since, upon doing this, we still have one slot left to input a vector, the output must be a (0,1)(0,1) tensor, also known as a 1 -form. We conclude that
That is, the metric slot machine maps vectors onto 1-forms.
4.3 Transformations
We have stressed the role of transformations between sets of coordinates. The inner product X_(alpha^('))Y^(alpha^('))X_{\alpha^{\prime}} Y^{\alpha^{\prime}} is a scalar and should, therefore, be coordinate invariant. Recall that the up components of a vector transform as
where x^(alpha^('))x^{\alpha^{\prime}} and x^(mu)x^{\mu} are two sets of coordinates. ^(13){ }^{13} Since we have that Lambda^(nu)_(alpha)Lambda^(alpha^('))_(mu)=delta^(nu)_(mu)\Lambda^{\nu}{ }_{\alpha} \Lambda^{\alpha^{\prime}}{ }_{\mu}=\delta^{\nu}{ }_{\mu} it must be the case that the down components should transform as
Compared with the results from the previous chapter, we see that the down components transform in the same way as the basis vectors e_(mu).^(14)\boldsymbol{e}_{\mu} .{ }^{14} It should then come as no surprise that the basis 1 -forms transform in the same way as the vector components, as demonstrated in the next example.
Example 4.6
We can see how to transform basis 1-forms by considering the contraction (:omega^(beta),e_(alpha):)=\left\langle\boldsymbol{\omega}^{\beta}, \boldsymbol{e}_{\alpha}\right\rangle=delta^(beta)_(alpha)\delta^{\beta}{ }_{\alpha}. First, multiply through by a coordinate transformation Lambda^(alpha)_(gamma^('))\Lambda^{\alpha}{ }_{\gamma^{\prime}} to find
Comparing against (:omega^(sigma^(')),e_(gamma^(')):)=delta^(sigma^('))_(gamma^('))\left\langle\boldsymbol{\omega}^{\sigma^{\prime}}, \boldsymbol{e}_{\gamma^{\prime}}\right\rangle=\delta^{\sigma^{\prime}}{ }_{\gamma^{\prime}} we conclude
where X_(nu)X_{\nu} are the components of the 1 -form tilde(X)=X_(alpha)omega^(alpha)\tilde{\boldsymbol{X}}=X_{\alpha} \boldsymbol{\omega}^{\alpha}. This means that in Example 4.4, the 1 -form k\boldsymbol{k} has components k_(mu)=eta_(mu nu)k^(nu)=(-omega,k)k_{\mu}=\eta_{\mu \nu} k^{\nu}=(-\omega, k) and so (: tilde(k),x:)\langle\tilde{\boldsymbol{k}}, \boldsymbol{x}\rangle yields the required phase. ^(13){ }^{13} As usual we denote one coordinate system with primed indices and one without primes. ^(14){ }^{14} The reason is the same: we want both (i) the vector X^(mu)e_(mu)X^{\mu} e_{\mu} and (ii) the scalar X_(mu)Y^(mu)X_{\mu} Y^{\mu} to be independent of coordinates. This also explains our notation, with 1-form components and basis vectors both carrying an index in the down position: this tells us that they transform the same way. ^(15){ }^{15} One thing that the bold-symbol notation for a tensor T(,\boldsymbol{T}(,,)lacksisclear) lacks is clear tation for a tensor T(,\boldsymbol{T}(,,)lacksisclear) lacks is clear
guidance on the valence of the tensor, guidance on the valence of the tensor,
which must be given separately in the which must be given separately in the
form (m,n)(m, n). One solution to this is form (m,n)(m, n). One solution to this is
to use abstract index notation, invented by Roger Penrose (1931-). The idea here is to specify the slots using indices, so a (2,1)(2,1) tensor would be written as T^(ab)_(c)T^{a b}{ }_{c}. The indices here are not the components; to express those we need to ensure that we specify components and slots with different letters. One common convention is to use Roman letters for slots and Greek letters for components, so that the components of T\boldsymbol{T} would be T^(mu nu)_(rho)T^{\mu \nu}{ }_{\rho}. Clearly there is potential for confusion here, so it's necessary to know the convention being adopted.
To extract a number from a tensor, we insert 1-forms tilde(Z)_(a)\tilde{Z}_{a} and tilde(Z)_(b)\tilde{Z}_{b} and a vector A^(c)A^{c}, balancing Roman indices to obtain
(4.38)
where we note that the letters denote the relevant slot, rather than an in struction to sum on an index. We don't use abstract index notation here although some of the more advanced textbooks in the subject (e.g. Wald) do use it. ^(16)A{ }^{16} \mathrm{~A} mixed object is a tensor with a valence ( m,nm, n ) where m,n!=0m, n \neq 0, that is, a tensor whose components carry both up and down indices.
Fig. 4.6 The tensor as a slot machine. It has mm slots for 1 -forms and nn slots for vectors. If you insert those and turn the handle (metaphorically) then it spits out a number.
We can now summarize how to transform components and basis vectors:
Let's now look at the general concept of a tensor. Generally speaking, the tensor T\boldsymbol{T} is a linear slot machine with mm slots for inputting 1-forms and nn slots to input vectors (see Fig. 4.6). We have to specify how many of each by specifying the valence (m,n)(m, n) of the tensor. ^(15){ }^{15}
For vectors we can write an expression relating the vector to its components and basis vectors X=X^(mu)e_(mu)\boldsymbol{X}=X^{\mu} e_{\mu}, and an analogous expression for 1 -forms sigma=sigma_(mu)omega^(mu)\boldsymbol{\sigma}=\sigma_{\mu} \boldsymbol{\omega}^{\mu}. To write a similar expression for tensors we need to use the outer product between basis vectors, denoted by ox\otimes. This symbol is simply a means of denoting the slot machine character of the tensor. Its key property is that it maintains the ordering of the slots. This idea is best understood by considering an example.
Example 4.7
Consider a tensor e_(1)oxe_(2)e_{1} \otimes e_{2} : in words, the outer product of the basis vector in the 1 direction and the basis vector in the 2 direction. This is an object with two slots that takes two 1 -forms. The outer product symbol ox\otimes simply tells us that the first slot refers to e_(1)\boldsymbol{e}_{1} and the second to e_(2)\boldsymbol{e}_{2}. Inserting 1-forms tilde(alpha)=alpha_(mu)omega^(mu)\tilde{\boldsymbol{\alpha}}=\alpha_{\mu} \boldsymbol{\omega}^{\mu} and tilde(beta)=beta_(nu)omega^(nu)\tilde{\boldsymbol{\beta}}=\beta_{\nu} \boldsymbol{\omega}^{\nu}, we have
For a mixed object ^(16){ }^{16} like omega^(2)oxe_(3)\boldsymbol{\omega}^{2} \otimes \boldsymbol{e}_{3}, that is, a tensor formed from the outer product of the basis 1 -form for the 2 direction and the basis vector for the 3 direction, let's enter a vector v=v^(mu)e_(mu)\boldsymbol{v}=v^{\mu} \boldsymbol{e}_{\mu} in the first slot and a 1-form tilde(alpha)\tilde{\boldsymbol{\alpha}} in the second to find
This is an object into which we can insert three 1 -forms and a vector. Inserting 1 -forms tilde(zeta), tilde(eta)\tilde{\boldsymbol{\zeta}}, \tilde{\boldsymbol{\eta}} and tilde(chi)\tilde{\boldsymbol{\chi}} and the vector u\boldsymbol{u}, into our example tensor S\boldsymbol{S}, we find
Tensors are independent of coordinate system, but their components depend on the details of the coordinates. How do tensor components transform? We use the tensor transformation law that says that the transformation is carried out by a multiplication of transformation matrices, one for each index. Specifically, every up index mu\mu is transformed by a matrix Lambda^(alpha^('))_(mu)=delx^(alpha^('))//delx^(mu)\Lambda^{\alpha^{\prime}}{ }_{\mu}=\partial x^{\alpha^{\prime}} / \partial x^{\mu} and every down index sigma\sigma is transformed by a matrix delx^(sigma)//delx^(beta^('))\partial x^{\sigma} / \partial x^{\beta^{\prime}}. Our example tensor therefore transforms as
In coordinate-based treatments, the tensor transformation law is used to define tensors, but we prefer to use the slot-machine definition which is much cleaner. ^(17){ }^{17}
Example 4.9
The (0,2)(0,2) metric tensor is written as
This tensor has components that can be extracted: eta(e_(alpha),e_(beta))=eta_(alpha beta)\boldsymbol{\eta}\left(\boldsymbol{e}_{\alpha}, \boldsymbol{e}_{\beta}\right)=\eta_{\alpha \beta}. We can now see explicitly what happens if we insert a vector v\boldsymbol{v} into one of the slots
where in the penultimate line, we've used the components of the metric tensor to lower an index. We see that the output is, as we predicted, a 1 -form with components v_(nu)v_{\nu}.
The tensor above has valence (0,2)(0,2), but we can also define a (2,0)(2,0) version
where eta_(mu nu)eta^(nu sigma)=delta^(sigma)_(mu)\eta_{\mu \nu} \eta^{\nu \sigma}=\delta^{\sigma}{ }_{\mu} (implying eta_(mu nu)=eta^(mu nu)\eta_{\mu \nu}=\eta^{\mu \nu} ). The (2,0)(2,0) version of the tensor inputs two 1 -forms and can map a 1 -form to a vector
Finally, note that we can use the metric tensor on the components of a tensor to raise or lower them, one at a time. So we have, for example, that
^(17){ }^{17} There is an unfortunate tendency for general relativity to become 'death by indices'. Our use of coordinate-free objects, such as S(,,\boldsymbol{S}(,,,),isintendedto) , is intended to avoid this and this way of writing equations is sometimes called index-free notation. Imagine if you had learnt electromagnetism just in terms of coordinates, but never having seen vector notation. Efficient notation can declutter equations and (hopefully) make the physics more transparent. ^(18){ }^{18} Spoiler alert: the Einstein equation (which we will get to properly at the end of Part II) has the form ([" Curvature "],[" of "],[" spacetime "])=([" Mass-energy "],[" density at "],[" this point "])\left(\begin{array}{c}\text { Curvature } \\ \text { of } \\ \text { spacetime }\end{array}\right)=\left(\begin{array}{c}\text { Mass-energy } \\ \text { density at } \\ \text { this point }\end{array}\right)
The right-hand side of this equation will be related to the energy momentum tensor. ^(19){ }^{19} Astrophysicists like talking about dust, as there's a lot of it about in the dust, as there's a lot of it about in the
Universe. The term refers to solid particles that can be anything from a few molecules up to macroscopic size, and for our purposes we are going to assume that they are just bits of mass, distributed in space, at a low enough density that they don't interact with each other.
and so on.
This section has contained a lot of formalism, but let's finish it with a couple of very simple corollaries that shouldn't be forgotten amidst all the mathematical manipulations.
If two tensors A\boldsymbol{A} and B\boldsymbol{B} are equal to each other in one frame, they will be equal to each other in all frames. This is obvious if you think of the tensors in a coordinate-free way. Alternatively, construct the tensor C=A-B\boldsymbol{C}=\boldsymbol{A}-\boldsymbol{B}, which is identically zero, and so all its components are zero. Its components will clearly all be zero if multiplied by any transformation matrix.
A scalar [which is a (0,0)(0,0) tensor] takes the same numerical value in all frames. (An example is the Ricci scalar, to be introduced in Chapter 11.) Therefore, if you evaluate a scalar in the most convenient frame, you have got it for all frames.
4.5 Energy-momentum tensor
As a payoff for all of this formalism, we introduce one of the most important tensors in all of physics: the energy-momentum tensor. This tensor gives the (physical) right-hand side of the Einstein equation of general relativity. ^(18){ }^{18}
Let's start off by considering a set of dust particles in spacetime, ^(19){ }^{19} each of mass mm, and imagine that in some frame SS these particles are all distributed in space but are at rest. Their energy will just be mm per particle (remember that, if we reinstate the factors of cc, this would be mc^(2)m c^{2} per particle), and if there are n_(0)n_{0} particles per unit volume the energy density will be n_(0)mn_{0} m. In another inertial frame S^(')S^{\prime}, the energy becomes gamma m\gamma m per particle [acquiring a factor of gamma\gamma because the particles are now moving with speed vv, and {: gamma=(1-v^(2))^(-1//2)]\left.\gamma=\left(1-v^{2}\right)^{-1 / 2}\right] and the energy density becomes gamma^(2)n_(0)m\gamma^{2} n_{0} m (acquiring a second factor of gamma\gamma because the region containing the particles in SS will have become Lorentz contracted in S^(')S^{\prime} by a factor of gamma\gamma, increasing the density). Energy density therefore transforms with two factors of gamma\gamma and this indicates that it is part of a second-rank tensor.
To understand what this second-rank tensor could be, let's take a step back and think about particle current. Recall that the particle current can be expressed as a 4 -vector J=n_(0)u\boldsymbol{J}=n_{0} \boldsymbol{u}, where here n_(0)n_{0} is the density of dust particles in their rest frame and u\boldsymbol{u} is the 4 -velocity of the assembly of dust particles. The time-component of this current tells us about the number density n=gamman_(0)n=\gamma n_{0} of the particles [remember from Chapter 2 that u=gamma(1, vec(v))\boldsymbol{u}=\gamma(1, \vec{v}) so {:J=gamman_(0)(1,( vec(v)))]\left.\boldsymbol{J}=\gamma n_{0}(1, \vec{v})\right] and each spatial component of this current tells us the flux of particles along that direction (e.g. in Cartesian coordinates, J^(x)J^{x} tells us about the number of particles crossing the yzy z plane, per unit area, per unit time).
This is all useful for thinking about the flux of particles, but what if we want to understand the flux of 4 -momentum? That's a really interesting question because we would like to know how energy and momentum are transported across spacetime. The problem is that, unlike the number
of particles, which is a scalar, the 4 -momentum is a 4 -vector, and so its flux has to be a more complicated object than a 4 -vector. This confirms that the object needed to describe the flux of momentum will need to be a^(20)\mathrm{a}^{20} second-rank tensor since it depends on the 4-momentum and the 4-current. We call this object the energy-momentum tensor ^(21)T({ }^{21} \boldsymbol{T}(,),) , and it has two slots (or in components, it will be a second-rank tensor). We will define it (for now) as the symmetric tensor
and from this it's readily apparent that T(\boldsymbol{T}(,)isa(0,2)) is a (0,2) object that inputs two vectors and has components T_(mu nu)=T(e_(mu),e_(nu))=J_(mu)rho_(nu)T_{\mu \nu}=\boldsymbol{T}\left(\boldsymbol{e}_{\mu}, \boldsymbol{e}_{\nu}\right)=J_{\mu} \rho_{\nu}. We can, of course, rewrite this tensor in other ways and we could define it as a symmetric (2,0)(2,0) object, with upstairs indices on its components T^(mu nu).^(22)T^{\mu \nu} .{ }^{22}
Example 4.10
To see what T\boldsymbol{T} looks like in practice, let's stick with components to begin with and evaluate everything in a frame in which the number density J^(0)-=n=gamman_(0)J^{0} \equiv n=\gamma n_{0}. The (2,0)(2,0) version of the energy-momentum tensor for the cloud of particles can then be written as ^(23)T^(mu nu)=J^(mu)p^(nu)=(n_(0)u^(mu))(mu^(nu)){ }^{23} T^{\mu \nu}=J^{\mu} p^{\nu}=\left(n_{0} u^{\mu}\right)\left(m u^{\nu}\right), where u\boldsymbol{u} is the velocity of the cloud with components u^(mu)u^{\mu}.
The time-time element T^(00)T^{00} is then just the energy p^(0)=gamma mp^{0}=\gamma m multiplied by J^(0)=gamman_(0)=nJ^{0}=\gamma n_{0}=n, and hence T^(00)=gamma nmT^{00}=\gamma n m is equal to the energy density.
The space-time and time-space elements T^(i0)T^{i 0} and T^(0i)T^{0 i} are n gamma mv^(i)n \gamma m v^{i} and hence correspond to the density of the ii th component of the momentum.
The space-space elements T^(ij)T^{i j} are n gamma mv^(i)v^(j)n \gamma m v^{i} v^{j} and are momentum fluxes which, as discussed below, correspond to stresses.
Another way of looking at the components is to say that the energy-momentum tensor T^(mu nu)T^{\mu \nu} tells us the flux of the 4-momentum p^(mu)p^{\mu} that crosses a surface of constant x^(nu)x^{\nu}. In particular, this means that
T^(00)T^{00} is the energy density, since it is the flux of p^(0)p^{0} (energy) crossing a surface of constant time (i.e. filling space).
T^(0i)=T^(i0)T^{0 i}=T^{i 0} is the mass flux across a surface of constant x^(i)x^{i}, which is equivalent to the density of the ii th component of linear momentum.
T^(ij)T^{i j} is the iji j component of the usual stress tensor, meaning that the off diagonal terms are shear stresses and the diagonal terms ( T^(ii)T^{i i} ) correspond to pressures.
Two very simple examples of this tensor are as follows:
(1) A set of dust particles at rest. These only have energy density, and are not moving and so have no linear momentum. Hence, in their rest frame, we have
(2) An isotropic fluid in equilibrium. The particles in the fluid exert ^(24){ }^{24} a pressure pp, but have no preferred direction (meaning that T^(i0)=0T^{i 0}=0 and T^(ij)T^{i j} has no off-diagonal components). Hence, in the rest frame of the fluid, we have
We will return to this problem in much more detail in Chapter 12. ^(20){ }^{20} In general, a second-rank tensor has (m,n)=(2,0)(m, n)=(2,0) or (m,n)=(0,2)(m, n)=(0,2). ^(21){ }^{21} This is also known as the stressenergy tensor. ^(22){ }^{22} That is, we transform
Since eta_(mu nu)=eta^(mu nu)\eta_{\mu \nu}=\eta^{\mu \nu} is a diagonal tensor with components diag(-1,1,1,1)\operatorname{diag}(-1,1,1,1), we see that raising or lowering a timelike component earns us a minus sign. This allows us to see immediately that T^(00)=T_(00),T^(ii)=T_(ii),T^(0i)=-T_(0i)T^{00}=T_{00}, T^{i i}=T_{i i}, T^{0 i}=-T_{0 i} and T^(i0)=-T_(i0)T^{i 0}=-T_{i 0}. This is a good example of a tensor for which the components can change when you move the indices from upstairs to downstairs (in contrast to eta_(mu nu)\eta_{\mu \nu}, for which they do not, as explained on page 45). ^(23){ }^{23} Reminder: We use u^(mu)=(gamma,gammav^(i))u^{\mu}=\left(\gamma, \gamma v^{i}\right), J^(mu)=n_(0)u^(mu)=(n,nv^(i))J^{\mu}=n_{0} u^{\mu}=\left(n, n v^{i}\right) and p^(mu)=p^{\mu}=mu^(mu)=(gamma m,gamma mv^(i))m u^{\mu}=\left(\gamma m, \gamma m v^{i}\right). ^(24){ }^{24} Do not confuse pressure pp with momentum. The context should always make it clear, but it's a shame the two quantities have the same symbol. ^(25){ }^{25} We need our previous results that for an observer with velocity u, tilde(p)(u)=\boldsymbol{u}, \tilde{\boldsymbol{p}}(\boldsymbol{u})=-E-E and tilde(J)(u)=-n\tilde{\boldsymbol{J}}(\boldsymbol{u})=-n. ^(26){ }^{26} In components, eqn 4.58 is T_(mu nu)u^(mu)u^(nu)=nET_{\mu \nu} u^{\mu} u^{\nu}=n E. ^(27){ }^{27} This is guaranteed because the tensor is symmetric, i.e. T(u,a)=\boldsymbol{T}(\boldsymbol{u}, \boldsymbol{a})=T(a,u)\boldsymbol{T}(\boldsymbol{a}, \boldsymbol{u}). In components, one could write eqns 4.59 and 4.60 as T_(mu nu)u^(mu)a^(nu)=T_{\mu \nu} u^{\mu} a^{\nu}=-np_(mu)a^(mu)-n p_{\mu} a^{\mu} and T_(mu nu)a^(mu)u^(nu)=-EJ_(mu)a^(mu)T_{\mu \nu} a^{\mu} u^{\nu}=-E J_{\mu} a^{\mu}. These quantities are equal because, recalling that the observer is travelling with velocity u\boldsymbol{u}, we have J=n_(0)v=\boldsymbol{J}=n_{0} \boldsymbol{v}=np//gamma(u)mn \boldsymbol{p} / \gamma(u) m and E=gamma(u)mE=\gamma(u) m, where uu is the speed of the observer relative to the measurement frame. ^(28){ }^{28} In components, T_(mu nu)u^(mu)=-np_(nu)T_{\mu \nu} u^{\mu}=-n p_{\nu}
Finally, let's use the elegant formalism of our slot machines to consider the energy-momentum tensor as a machine with two slots in it. This will allow us to read off the properties of our set of dust particles in the frame of an observer travelling with velocity vector u\boldsymbol{u}. Our dust particles are described with a momentum 1-form tilde(p)()\tilde{\boldsymbol{p}}() and so, following the results proved in Example 4.5, if we insert a velocity vector u\boldsymbol{u} into tilde(p)()\tilde{\boldsymbol{p}}() then we will output (with a minus sign) the energy of the particle EE, as measured by an observer (with velocity vector u\boldsymbol{u} )
The particle current 1 -form is tilde(J)()\tilde{\boldsymbol{J}}(). Insert a velocity u\boldsymbol{u} and, with a minus sign, we output the number density of particles measured by the observer with velocity u\boldsymbol{u}
Example 4.11
A swarm of massive particles has a particle current J=n_(0)v\boldsymbol{J}=n_{0} \boldsymbol{v}, where n_(0)n_{0} is the number density of particles in the rest frame of the swarm, each particle has rest mass mm, velocity v\boldsymbol{v} and momentum p=mv\boldsymbol{p}=\boldsymbol{m} \boldsymbol{v}. The density of particles in the swarm's rest frame is rho_(0)=mn_(0)\rho_{0}=m n_{0}. The (0,2)(0,2) energy-momentum tensor in this case is given by
Here tilde(v)=v_(mu)omega^(mu)\tilde{\boldsymbol{v}}=v_{\mu} \boldsymbol{\omega}^{\mu} are velocity 1 -forms for the fluid. This energy-momentum tensor has components T_(mu nu)=rho_(0)v_(mu)v_(nu)T_{\mu \nu}=\rho_{0} v_{\mu} v_{\nu}. We can insert some vectors into the slots in order to understand the physical meaning of the components of T.^(25)\boldsymbol{T} .{ }^{25}
(i) We start by inserting the observer's velocity vector u\boldsymbol{u} in both slots
{:(4.58)=nE:}\begin{equation*}
=n E \tag{4.58}
\end{equation*}
the output ^(26){ }^{26} is the energy density in the observer's rest frame.
(ii) Now enter a dimensionless vector a\boldsymbol{a} and the velocity to find
which is (minus) the momentum density pointing along vector a\boldsymbol{a}, as measured by the observer.
(iii) Putting the vectors in the other way round, we find
which is (minus) the energy transported along the direction a\boldsymbol{a}, according to the observer. This expression is equal to the particle momentum density transported along a\boldsymbol{a} (in eqn 4.59). ^(27){ }^{27}
(iv) Now try entering a single velocity vector into one slot
which ^(28){ }^{28} gives the 4 -momentum density 1 -form in the rest frame of the observer.
Chapter summary
1-forms can be viewed as a set of equally spaced planes. They combine with vectors to form numbers
Tensors are linear slot machines that input vectors and 1-forms and output numbers.
The energy-momentum tensor T\boldsymbol{T} is a (0,2)(0,2) [or (2,0)](2,0)] symmetric tensor with components T_(mu nu)T_{\mu \nu} [or T^(mu nu)T^{\mu \nu} ]. It tells us about the flux of the 4 -momentum, and the time-time component T^(00)T^{00} gives us access to the energy density.
Exercises
(4.1) A tensor W\boldsymbol{W} has components W^(alpha beta)=u^(alpha)v^(beta)W^{\alpha \beta}=u^{\alpha} v^{\beta} where u\boldsymbol{u} and v\boldsymbol{v} are 4 -vectors. Show that W\boldsymbol{W} transforms properly as a tensor.
(4.2) Show that delta^(alpha)_(beta)\delta^{\alpha}{ }_{\beta} transforms properly as a tensor. What about delta_(alpha beta)\delta_{\alpha \beta} and delta^(alpha beta)\delta^{\alpha \beta} ?
(4.3) Show that if you take a tensor S(\boldsymbol{S}(, , with components S_(nu)^(mu)S_{\nu}^{\mu} you can construct a Lorentz invariant scalar by evaluating S^(mu)_(mu)S^{\mu}{ }_{\mu}. (Remember that using the summation convention, this involves evaluating {:sum_(mu=0)^(4)S^(mu)_(mu).)\left.\sum_{\mu=0}^{4} S^{\mu}{ }_{\mu}.\right)
(4.4) Consider flat space in spherical polar coordinates. (a) Compute the basis 1-forms omega^(r),omega^(theta)\boldsymbol{\omega}^{r}, \boldsymbol{\omega}^{\theta} and omega^(phi)\boldsymbol{\omega}^{\phi} in terms of the Cartesian basis 1-forms omega^(x),omega^(y)\boldsymbol{\omega}^{x}, \boldsymbol{\omega}^{y} and omega^(z)\omega^{z}.
(b) Write basis vectors e_(r),e_(theta)\boldsymbol{e}_{r}, \boldsymbol{e}_{\theta} and e_(phi)\boldsymbol{e}_{\phi} in terms of the usual Cartesian basis e_(x),e_(y)\boldsymbol{e}_{x}, \boldsymbol{e}_{y} and e_(z)\boldsymbol{e}_{z}.
(c) Using these results, show that (:omega^(mu),e_(nu):)=0\left\langle\boldsymbol{\omega}^{\mu}, \boldsymbol{e}_{\nu}\right\rangle=0 for mu!=nu\mu \neq \nu, where the indices are r,thetar, \theta and phi\phi.
(4.5) Consider a coordinate system (u,v,w)(u, v, w) related to Cartesian coordinates via
Compute (a) basis vectors and (b) basis 1-forms, in terms of the Cartesian e_(mu)\boldsymbol{e}_{\mu} and omega^(mu)\boldsymbol{\omega}^{\mu}.
(4.6) Consider the invariant
(4.8) The Doppler effect in special relativity. By considering the Lorentz-transformation properties of the wavevector 4-vector k\boldsymbol{k} with components k^(mu)=(omega, vec(k))k^{\mu}=(\omega, \vec{k}) derive the expression for the Doppler effect on the frequency of the wave, as predicted by special relativity.
5
5.1 Metrics in general
5.2 Meet some metrics
5.3 Light and light cones
Chapter summary
Exercises
The metric
^(1){ }^{1} There's scope for confusion here as a metric is an object into which we inmetric is an object into which we in-
put two vectors. The order here is that we input a position in spacetime into the metric field and output a metric for that point in spacetime. We can then insert two vectors into that metric tensor and find their scalar product at that point in spacetime. ^(2){ }^{2} It will turn out that the metric also features on the right-hand side of this equation too, with the result that the equation is difficult to solve.
...with the measure you use, it will be measured to you (Matthew 7^(2)7^{2} )
In the previous chapter, we learned that a tensor is a machine for turning vectors and 1-forms into scalars. We have also seen that the metric tensor eta(\boldsymbol{\eta}(,)isa(0,2)) is a (0,2) tensor (meaning that you feed it two vectors and it spits out a number) that tells you how to obtain the scalar product of two vectors (X*Y=eta_(mu nu)X^(mu)Y^(nu):}\left(\boldsymbol{X} \cdot \boldsymbol{Y}=\eta_{\mu \nu} X^{\mu} Y^{\nu}\right., from eqn 4.1). However, everything so far has been for flat Minkowski spacetime (i.e. for special, not general, relativity). It's time now to tackle gravity and when we include that, spacetime becomes curved. We are working up to Einstein's theory of gravitation that can be succinctly stated as
{:(5.1)((" Curvature of ")/(" spacetime "))=((" Energy density of ")/(" matter in spacetime ")):}\begin{equation*}
\binom{\text { Curvature of }}{\text { spacetime }}=\binom{\text { Energy density of }}{\text { matter in spacetime }} \tag{5.1}
\end{equation*}
With curvature included, we need a more general metric and we shall use the symbol g(\boldsymbol{g}(,)forourgeneral(0,2)) for our general (0,2) metric tensor [retaining eta(\boldsymbol{\eta}(,)for) for the special case of flat Minkowski spacetime]. The metric tensor g(\boldsymbol{g}(, will feed into the left-hand side of eqn 5.1 . This chapter will focus on the form of g(\boldsymbol{g}(,)forseveraldifferenttypesofspacetime.) for several different types of spacetime.
In flat spacetime, g()=,eta(\boldsymbol{g}()=,\boldsymbol{\eta}(,)everywherethroughoutspacetime.) everywhere throughout spacetime. In curved spacetime, g(\boldsymbol{g}(,)variesfrompointtopoint.Thismeansthat) varies from point to point. This means that the metric is a field. A field is a quantity where we input a position in spacetime and output a tensor valid for that position in spacetime. For the metric field, we input a position in spacetime and output the appropriate metric tensor that allows us to take dot products at that point in space. ^(1){ }^{1} Of course the tensors at closely spaced points in spacetime will be related and this relationship gives rise to a field theory: a theory that allows us to describe and predict changes in the fields as a function of space and time and also to examine the consequences the field has on the physics. General relativity is the field theory of gravity. The left-hand side of its governing equation is based on the metric field: the field describing the geometry of spacetime. ^(2){ }^{2} We begin by restating some simple general facts about metric tensors and their components.
5.1 Metrics in general
For a general space, the metric at a point in spacetime g(\boldsymbol{g}(,)isa(0,2)) is a (0,2) slot machine that takes two vectors as input and outputs their scalar
product. ^(3){ }^{3} Inserting vectors X\boldsymbol{X} and Y\boldsymbol{Y} into the metric, we obtain
The metric encodes distances in spacetime via an expression in terms of infinitesimal intervals between coordinates. Previously, we wrote the invariant line element in terms of the Minkowski metric as ds^(2)=\mathrm{d} s^{2}=eta_(mu nu)dx^(mu)dx^(nu)\eta_{\mu \nu} \mathrm{d} x^{\mu} \mathrm{d} x^{\nu}. This hints at a simple way to write down the invariant infinitesimal length of a line element in any space, so we adopt it and write ^(4){ }^{4}
This equation is a simple statement of how long an infinitesimal interval is in a particular spacetime geometry, specified by the components of the metric. It's often useful to write down this line-element equation and, since it contains all of the components of the metric g_(mu nu)g_{\mu \nu}, we often say that the line element is the metric. We can integrate ds\mathrm{d} s to work out the total interval ss between two events. For example, the magnitude of the interval along a curve between events at points A\mathcal{A} and B\mathcal{B} can be worked out using the metric via the prescription
Example 5.1
Suppose we have two different coordinate systems. We can use the invariance of the line element to relate the components of the metric together. The line element is written as
That is, the components of the metric transform as we expect the components of a (0,2)(0,2) tensor to transform. ^(3){ }^{3} We have also seen that it is possible to define the metric as a (2,0)(2,0) tensor, which is a slot machine that takes two 1 -forms as input and outputs a scalar. In components, this gives the 'up' form of the metric g^(mu nu)g^{\mu \nu}. As explained in the previous chapter, the 'up' form of the metric is the inverse of the 'down' form, or
^(4){ }^{4} This is a useful example of the metric tensor acting as a slot machine. If we want to find the squared infinitesimal interval between positions we insert dx\mathrm{d} \boldsymbol{x} into both slots of the metric tensor and the output is exactly the invariant interval ds^(2)\mathrm{d} s^{2} that we seek ^(5){ }^{5} Note that the line element in this form features a multiplication of the factors dx^(mu)\mathrm{d} x^{\mu}. In this expression, it does not represent an infinitesimal area, but rather the square of a length, and so the transformation simply requires a multiplication of the individual transformations. Later in the chapter we shall see how a transformation of a product that does represent an area or volume requires us to consider a Jacobian. This feature is discussed in more detail in Part V of the book (see, in particular, Chapter 38).
5.2 Meet some metrics
With some general rules recapped, let's now meet some metric line elements. ^(6){ }^{6}
We begin with the simplest and most familiar line element. This is the metric for Euclidean space in three dimensions, whose line element is simply
Special relativity is founded on the Minkowski metric in (3+1) dimensions (i.e. a combination of three spatial dimensions and one time dimension). In Cartesian coordinates, the line element for this metric is written as
We can also take the coordinate components to form a vector with components dx^(mu)=(dt,dx,dy,dz)\mathrm{d} x^{\mu}=(\mathrm{d} t, \mathrm{~d} x, \mathrm{~d} y, \mathrm{~d} z) and write the line element as ds^(2)=g_(mu nu)dx^(mu)dx^(nu)\mathrm{d} s^{2}=g_{\mu \nu} \mathrm{d} x^{\mu} \mathrm{d} x^{\nu} where
There's nothing stopping us working in cylindrical polar coordinates and describing the same flat Minkowski space using these. ^(7){ }^{7} The Minkowski line element in cylindrical polars is therefore
We can also take the coordinate components to form a column vector with components dx^(mu)=(dt,dr,dtheta,dz)\mathrm{d} x^{\mu}=(\mathrm{d} t, \mathrm{~d} r, \mathrm{~d} \theta, \mathrm{~d} z) and then the line element is ds^(2)=g_(mu nu)dx^(mu)dx^(nu)\mathrm{d} s^{2}=g_{\mu \nu} \mathrm{d} x^{\mu} \mathrm{d} x^{\nu}, where
Example 5.2
Although the space that's described here is still flat, if we fix rr we have our first curved space: the two-dimensional surface of a sphere. Consider the surface of a sphere of circumference 2pi a2 \pi a. From the spherical polar example, we need only fix r=ar=a and the metric line element for this two-dimensional surface becomes
A final example of a metric is the Newtonian limit of the metric that emerges from general relativity as a limiting case of the Einstein equation. This Newtonian-metric line element is written as
where Phi\Phi is the gravitational potential. We have already motivated the (1+2Phi)(1+2 \Phi) term multiplying dt^(2)\mathrm{d} t^{2} in eqn 2.75 (Example 2.13) where by working in the limit v≪cv \ll c we were only able to obtain the correction to the time-dependent part of the metric. Equation 5.22 has included a very similar correction to the spatial coordinates as well. Note that if the potential Phi\Phi is set to zero, this metric reverts to the Minkowski metric of flat space (eqn 5.10), as expected.
Equation 5.22 can also be expressed by writing down the non-zero components of the metric tensor g_(00)=-(1+2Phi),g_(11)=g_(22)=g_{00}=-(1+2 \Phi), g_{11}=g_{22}=g_(33)=1-2Phig_{33}=1-2 \Phi, or by writing down the components of the tensor as
Note that the potential Phi\Phi is assumed to be a function of position, but this limit works for static solutions in which the potential is not time-varying.
The Newtonian-limit line element can also be written in spherical polars
Notice how in the previous examples of metrics (other than the simple Minkowski metric line element in Cartesian coordinates) the components of the metric vary in space. That is, the metric is a function of the underlying coordinates. We should therefore write the metric components as a function g_(mu nu)(x)g_{\mu \nu}(x). The metric is now manifestly an example of a field. The metric field takes a spacetime coordinate x^(mu)x^{\mu} as an input and outputs the metric tensor at that point. ^(9){ }^{9}
↷\curvearrowright See Chapter 14 for a derivation of eqn 5.22\mathbf{5 . 2 2}.
^(9){ }^{9} Here's a simple example of the use of the metric using the slot-machine picture. Working in flat space with cylindrical coordinates X^(mu)=(t,r,theta,z)X^{\mu}=(t, r, \theta, z), the distance between points separated by an interval dX\mathrm{d} \boldsymbol{X} with coordinates dX^(mu)=(0,0,dtheta,0)\mathrm{d} X^{\mu}=(0,0, \mathrm{~d} \theta, 0) is found by evaluating
Notice how the interval ds^(2)\mathrm{d} s^{2} changes its size depending on the value of rr at which we evaluate the interval. This follows from g\boldsymbol{g} being a field: we input a position in spacetime and output the interval appropriate for that position. To compute an interval between points theta_(1)\theta_{1} and theta_(2)\theta_{2}, separated by a larger interval (i.e. the separation is not infinitesimal), we then evaluate Delta s=intds=int_(theta_(1))^(theta_(2))sqrt(g_(theta theta))dtheta=int_(theta_(1))^(theta_(2))rdtheta\Delta s=\int \mathrm{d} s=\int_{\theta_{1}}^{\theta_{2}} \sqrt{g_{\theta \theta}} \mathrm{d} \theta=\int_{\theta_{1}}^{\theta_{2}} r \mathrm{~d} \theta.
We'll use these ideas in the following sections. ^(10){ }^{10} In fact, there are two different ways of visualizing a metric: you can draw its light cones, or embed a slice of it in a higher dimensional space. The latter approach is discussed in Appendix D.
(b)
(c) Fig. 5.1 Light cones in (a) flat
spacetime, (b) Rindler spacetime and spacetime, (b) Rindler spacetime and
(c) baby-Eddington-Finkelstein coordinates.
5.3 Light and light cones
One way of visualizing a metric is to draw the light cones in the spacetime that it describes. Light cones are absolute surfaces that separate timelike and spacelike intervals. We use local light cones to visualize spaces because, in curved spacetime, the light cones change their orientation as a function of position in spacetime. The pattern of local light cones represents almost all of the structure of spacetime. ^(10){ }^{10} Recall that the infinitesimal interval between two events on the photon's world line satisfies
Using this condition, we can work out the orientation of the light cones in whichever coordinates we've chosen to use. The orientation is important as it reveals the signals that observers can send and receive at each point. We investigate some examples below.
Example 5.3
(a) Light in flat Minkowski spacetime obeys
and so dt=+-d| vec(r)|\mathrm{d} t= \pm \mathrm{d}|\vec{r}|, which is integrated to give an equation for the light cones emerging from a point (t_(0),r_(0))\left(t_{0}, r_{0}\right), with the result that light cones obey the coordinate equation
The light cones look the same everywhere [see Fig. 5.1(a), where we just draw the forward light cones for simplicity]. This uniformity of the light cones throughout spacetime is exactly as we stated earlier [see Fig. 3(a)]. A useful innovation at this point are the so-called light-cone coordinates
{:(5.32)ds^(2)=-dudv:}\begin{equation*}
\mathrm{d} s^{2}=-\mathrm{d} u \mathrm{~d} v \tag{5.32}
\end{equation*}
from which we can immediately read off that the light cones are coincident with the lines of constant uu and vv [see Fig. 5.1(a)].
Let's try other spacetimes. (For this particular exercise, we simply pluck these from thin air, without derivation.)
(b) The first one is known as Rindler spacetime (it comes from a coordinate choice appropriate for accelerated observers) and has a metric line element
We see that the cones change their shape as a function of position as shown in Fig. 5.1(b).
(c) Let's try another spacetime. This one has a metric line element we'll call the baby-Eddington-Finkelstein metric
{:(5.36)ds^(2)=-xdv^(2)+2dvdx:}\begin{equation*}
\mathrm{d} s^{2}=-x \mathrm{~d} v^{2}+2 \mathrm{~d} v \mathrm{~d} x \tag{5.36}
\end{equation*}
Light cones must cause ds^(2)\mathrm{d} s^{2} to vanish and so we spot that they have v=v= constant and also dv//dx=2//x\mathrm{d} v / \mathrm{d} x=2 / x, so we find
The light cones are shown in Fig. 5.1(c).
(d) Finally, we consider the slightly more complicated Eddington-Finkelstein metric ^(11)^{11}
{:(5.38)ds^(2)=-(1-(2GM)/(r))dv^(2)+2dvdr:}\begin{equation*}
\mathrm{d} s^{2}=-\left(1-\frac{2 G M}{r}\right) \mathrm{d} v^{2}+2 \mathrm{~d} v \mathrm{~d} r \tag{5.38}
\end{equation*}
Light cones have v=v= const again and also
{:(5.39)(dv)/((d)r)=2(1-(2GM)/(r))^(-1):}\begin{equation*}
\frac{\mathrm{d} v}{\mathrm{~d} r}=2\left(1-\frac{2 G M}{r}\right)^{-1} \tag{5.39}
\end{equation*}
Being able to visualize spacetimes in this way allows us to understand some exotic spacetimes, such as that of the next example.
Example 5.4
The Alcubierre metric ^(12){ }^{12} is an attempt to build the spacetime that would result from the action of a warp drive. A warp drive, much discussed in science fiction, is a device that appears to allow faster-than-light travel (judged from the point of view of an observer in flat spacetime). Of course, the propagation of signals (and travellers) faster than cc is not allowed. Instead, the warp drive works by making a bubble of curved spacetime where the light cones are oriented differently to those in flat space. The mathematical construction of the bubble starts with a curve x=x_(s)(t),y=x=x_{\mathrm{s}}(t), y=z=0z=0, which has a tangent v_(s)=dx//dtv_{\mathrm{s}}=\mathrm{d} x / \mathrm{d} t. Now we construct a smooth bubble function f(r_(s))f\left(r_{\mathrm{s}}\right), where r_(s)=sqrt((x-x_(s))^(2)+y^(2)+z^(2))r_{\mathrm{s}}=\sqrt{\left(x-x_{\mathrm{s}}\right)^{2}+y^{2}+z^{2}}. By construction, this function has the property f(0)=1f(0)=1 and decreases as we move from the origin, vanishing for r_(s) > Rr_{\mathrm{s}}>R, where RR is some distance that sets the edge of the bubble. The Alcubierre metric is then written as
This all looks rather complicated, but is best understood using Fig. 5.2. Figure 5.2 (a) shows the world line of the traveller between two distant points in spacetime with different values of the coordinate xx. The warp drive creates a bubble in spacetime shown by the dotted region. We can examine the light-cone structure inside the bubble by setting ds^(2)=0\mathrm{d} s^{2}=0. We find that the light cones are given by
In regions outside the bubble, where f=0f=0, the light cones are the normal ones of flat spacetime. Inside the bubble, the function ff causes the light cones to tip over, as shown in Fig. 5.2(b). The traveller must always move inside her forward light cone, but we see that inside the bubble, the tipping of the light cones means that the tangent of the world line appears to give rise to a velocity greater than cc, judged by the light cones outside of the warp bubble. As a result of the warp in spacetime, the traveller is able to travel vast distances that would require superluminal velocities in flat spacetime. ^(13){ }^{13} ^(11){ }^{11} We shall see this again when we examine the geometry of stars. It arises in the theory of spherically symmetric black holes. ^(12){ }^{12} Miguel Alcubierre (1964-). The warp drive proposal originated in M. Alcubierre, Class. Quantum Grav. 11, L73 (1994).
(a)
(b)
Fig. 5.2 (a) The world line of a trip in spacetime from ( t_(1), vec(x)_(1)t_{1}, \vec{x}_{1} ) to ( t_(2), vec(x)_(2)t_{2}, \vec{x}_{2} ), surrounded by a warped bubble of spacetime. (b) The light-cone structure in the warped region of spacetime. The light cones tip over in the warped region. ^(13){ }^{13} It's worth noting that warping spacetime in this way would require a source of negative energy, so is not something that is readily achievable!
5.4 Lengths, areas, volumes
Once we have a metric in the form of a line element, we can use it to calculate not only lengths (or intervals), but also areas and volumes.
Example 5.5
Consider the metric for flat space, given in terms of spherical polars, with line element ^(14){ }^{14} Recall that we can write this ds^(2)=quadds^(2)=-dt^(2)+dr^(2)+r^(2)dtheta^(2)+r^(2)sin^(2)thetadphi^(2)\mathrm{d} s^{2}=\quad \mathrm{d} s^{2}=-\mathrm{d} t^{2}+\mathrm{d} r^{2}+r^{2} \mathrm{~d} \theta^{2}+r^{2} \sin ^{2} \theta \mathrm{~d} \phi^{2}. ^(14){ }^{14} We can use this to find the radius of a g_(tt)dt^(2)+g_(rr)dr^(2)+g_(theta theta)dtheta^(2)+g_(phi phi)dphi^(2)g_{t t} \mathrm{~d} t^{2}+g_{r r} \mathrm{~d} r^{2}+g_{\theta \theta} \mathrm{d} \theta^{2}+g_{\phi \phi} \mathrm{d} \phi^{2}. In this example, we use the fact that if dt=dtheta=dphi=0\mathrm{d} t=\mathrm{d} \theta=\mathrm{d} \phi=0, then ds=sqrt(g_(rr))dr\mathrm{d} s=\sqrt{g_{r r}} \mathrm{~d} r, and so on.
Fig. 5.3 A sphere of radius RR. A circle on the surface of the sphere has a radius r_(0)r_{0} and is a line of fixed theta=theta_(0)\theta=\theta_{0}. sphere. We fix t=theta=phi=t=\theta=\phi= constant so that dt=dtheta=dphi=0\mathrm{d} t=\mathrm{d} \theta=\mathrm{d} \phi=0, with the ds^(2)=dr^(2)\mathrm{d} s^{2}=\mathrm{d} r^{2}. We then integrate the line element ds\mathrm{d} s from r=0r=0 to r=Rr=R
This is no surprise, but shows the general principle of how to manipulate the metric to extract a length. In the same way, we can find the radius and circumference of a circle (Fig. 5.3) which has a fixed value of theta=theta_(0)\theta=\theta_{0} on the sphere. First, let's work out r_(0)r_{0} which is the distance from the North pole to the circle. We start by fixing tt, rr and phi\phi (so that ds^(2)=R^(2)dtheta^(2)\mathrm{d} s^{2}=R^{2} \mathrm{~d} \theta^{2} ) and then integrating ds\mathrm{d} s to find
{:(5.44)int_(theta=0)^(theta_(0))ds=int_(theta=0)^(theta_(0))sqrt(g_(theta theta)(r=R))dtheta=int_(theta=0)^(theta_(0))Rdtheta=theta_(0)R:}\begin{equation*}
\int_{\theta=0}^{\theta_{0}} \mathrm{~d} s=\int_{\theta=0}^{\theta_{0}} \sqrt{g_{\theta \theta}(r=R)} \mathrm{d} \theta=\int_{\theta=0}^{\theta_{0}} R \mathrm{~d} \theta=\theta_{0} R \tag{5.44}
\end{equation*}
Again, an unsurprising result (elementary geometry tells us that theta_(0)R=r_(0)\theta_{0} R=r_{0} ). The circumference of this circle is calculated using ds^(2)=g_(phi phi)(r=R,theta=theta_(0))dphi^(2)=\mathrm{d} s^{2}=g_{\phi \phi}\left(r=R, \theta=\theta_{0}\right) \mathrm{d} \phi^{2}=R^(2)sin^(2)theta_(0)dphi^(2)R^{2} \sin ^{2} \theta_{0} \mathrm{~d} \phi^{2} and so
{:(5.45)int_(phi=0)^(2pi)ds=int_(phi=0)^(2pi)sqrt(g_(phi phi)(R,theta_(0)))dphi=2pi R sin theta_(0)=2pir_(0)sinc(r_(0))/(R):}\begin{equation*}
\int_{\phi=0}^{2 \pi} \mathrm{~d} s=\int_{\phi=0}^{2 \pi} \sqrt{g_{\phi \phi}\left(R, \theta_{0}\right)} \mathrm{d} \phi=2 \pi R \sin \theta_{0}=2 \pi r_{0} \operatorname{sinc} \frac{r_{0}}{R} \tag{5.45}
\end{equation*}
agreeing with eqn 3.20 .
We notice from these examples that the (proper) length of infinitesimal segments of coordinate x^(1)x^{1} are given by dl^(1)=sqrt(g_(11))dx^(1)\mathrm{d} l^{1}=\sqrt{g_{11}} \mathrm{~d} x^{1}. We can use this fact to work out how to calculate areas and volumes.
Example 5.6
Consider the special case of a diagonal ^(15){ }^{15} metric, with line element
matrix of metric components g_(mu nu)g_{\mu \nu}. A diagonal metric has a line element that features the squares of intervals such as (dx^(mu))^(2)\left(\mathrm{d} x^{\mu}\right)^{2}, but not mixed components such as dx^(mu)dx^(nu)\mathrm{d} x^{\mu} \mathrm{d} x^{\nu} with mu!=nu\mu \neq \nu. ^(16){ }^{16} This is one example of an area. We can also form areas from other pairs of coordinates too. For our example of a coordinates too. For our example of a
sphere, this choice turns out to be a sphere, this choice turns out to be a
sensible one since it yields an element of surface area.
=sqrt(g_(11)g_(22)g_(33))dx^(1)dx^(2)dx^(3)=\sqrt{g_{11} g_{22} g_{33}} \mathrm{~d} x^{1} \mathrm{~d} x^{2} \mathrm{~d} x^{3}.
As an example, let's consider the metric in spherical polar coordinates. The area element is, using eqn 5.47 , given by
For an element of 4 -volume [i.e. a volume of (3+1)-dimensional spacetime], we need to take account of the fact that the timelike component of all Lorentz metrics comes with a minus sign. We therefore write
and since g_(00)g_{00} is negative, we take the square root of a positive quantity. Note that the product g_(00)g_(11)g_(22)g_(33)g_{00} g_{11} g_{22} g_{33} is the determinant of the diagonal metric matrix. ^(17){ }^{17}
We now show that this result generalizes to cases where we don't have a diagonal metric, which is to say that an element of (3+1)(3+1)-dimensional volume, known as a 4 -volume, is given by
{:(5.53)dV=sqrt(-det g)d^(4)x:}\begin{equation*}
\mathrm{d} \mathcal{V}=\sqrt{-\operatorname{det} \boldsymbol{g}} \mathrm{d}^{4} x \tag{5.53}
\end{equation*}
where we note that the determinant det g\operatorname{det} \boldsymbol{g} is often simply written gg, so we would write dV=sqrt(-g)d^(4)x\mathrm{d} \mathcal{V}=\sqrt{-g} \mathrm{~d}^{4} x, with d^(4)x=dx^(0)dx^(1)dx^(2)dx^(3)\mathrm{d}^{4} x=\mathrm{d} x^{0} \mathrm{~d} x^{1} \mathrm{~d} x^{2} \mathrm{~d} x^{3}. As we shall see below, the 4 -volume dV\mathrm{d} \mathcal{V} is an invariant.
To prove that the 4 -volume is an invariant, let's note that volumes transform using an object that is called the Jacobian. ^(18){ }^{18} For our case of (3+1)(3+1)-dimensional spacetime, the Jacobian is given by the determinant of the transformation matrix and so
We can now prove the general rule for finding the volume of an element of 4-space. We assume that the volume on the left is the usual volume of an infinitesimal element in a Cartesian flat space. We need to show that the Jacobian (i.e. the determinant of the transformation matrix) is equal to sqrt(-g)\sqrt{-g}. The argument proceeds from a principle called local flatness. ^(19){ }^{19} An observer will perceive spacetime to be flat at the point at which they reside, in much the same way that (unless you are living on a hillside) we tend to perceive the Earth as locally flat and only abandon our locally Euclidean street-maps when we look further afield. The following example fills in the details.
Example 5.7
Proof: Arguing from local flatness, we transform the Minkowski metric tensor eta\boldsymbol{\eta} (for an observer's locally flat spacetime) into a general tensor g\boldsymbol{g} (for curved spacetime) at the particular point in spacetime of the observer's location. This can be done using ^(20){ }^{20} ^(17){ }^{17} The determinant of an n xx nn \times n matrix A\boldsymbol{A} is computed using the rule
where epsi_(i_(1)dotsi_(n))\varepsilon_{i_{1} \ldots i_{n}} is the Levi-Civita symbol, which is defined as epsi_(i_(1)dotsi_(n))=1\varepsilon_{i_{1} \ldots i_{n}}=1 for an even permutation of the indices and =-1=-1 for an odd permutation. If the matrix is diagonal, the determinant is matrix is diagonal, the determinant is
simply the product of the nn non-zero simply the
elements. ^(18){ }^{18} The Jacobian is named after the German mathematician C. G. J. Jacobi (1804-1851). A general coordinate transformation may be written x^(mu^('))=x^(mu^('))(x^(1),x^(2),x^(3),dots,x^(n))x^{\mu^{\prime}}=x^{\mu^{\prime}}\left(x^{1}, x^{2}, x^{3}, \ldots, x^{n}\right) where (mu=1,2dots,n)(\mu=1,2 \ldots, n). If we now arrange the n xx nn \times n partial derivatives delx^(mu^('))//delx^(nu)\partial x^{\mu^{\prime}} / \partial x^{\nu} into the transformation matrix
where we've introduced the commonly used notation for JJ. The Jacobian tells us how volume elements transform (and is discussed in more detail in Chapter 38). ^(19){ }^{19} The concept of local flatness is explained in more detail in Chapter 6 and will be used regularly from here onwards. ^(20){ }^{20} We write a matrix equation here where eta _\underline{\eta} is the Minkowski matrix (i.e. the matrix with components eta_(mu nu)\eta_{\mu \nu} ) and Lambda _\underline{\boldsymbol{\Lambda}} is a matrix with components Lambda_(nu)^(mu)\Lambda_{\nu}^{\mu}. The equation therefore is simply a rewriting of eqn 5.8:
However, detA_=detA_^(T)\operatorname{det} \underline{\boldsymbol{A}}=\operatorname{det} \underline{\boldsymbol{A}}^{\mathrm{T}} and also deteta _=-1\operatorname{det} \underline{\boldsymbol{\eta}}=-1. We conclude therefore that
The right-hand side of this equation is just the invariant 4 -volume dV\mathrm{d} \mathcal{V} that we met in eqn 5.53 . Thus, the volume of an infinitesimal element in locally flat Cartesian space is equal to the invariant 4 -volume, nicely illustrating the principle of local flatness.
Chapter summary
The metric is a field that encodes the geometry of spacetime. It allows us to compute intervals between events. The (0,2)(0,2) metric tensor takes two vectors and outputs a number
A metric can be visualized by working out its light cone structure. Light cones are defined by ds^(2)=0\mathrm{ds}^{2}=0.
The invariant volume element is given by
{:(5.65)dV=sqrt(-g)d^(4)x:}\begin{equation*}
\mathrm{d} \mathcal{V}=\sqrt{-g} \mathrm{~d}^{4} x \tag{5.65}
\end{equation*}
where gg is the determinant of the metric tensor g\boldsymbol{g}.
The principle of local flatness says that an observer will perceive spacetime to be flat Minkowski spacetime at the point at which they reside.
Exercises
(5.1) By finding the determinant g=det gg=\operatorname{det} \boldsymbol{g} of the rele vant metric, compute the invariant volume element for the following systems:
(a) A flat two-dimensional surface in cylindrical coordinates.
(b) A two-dimensional spherical surface in spherical coordinates.
(c) The two-dimensional surface of a torus.
represents flat space in coordinates (T,X)(T, X) by investigating the coordinate transformation
{:(5.68)X=x cosh t","quad T=x sinh t:}\begin{equation*}
X=x \cosh t, \quad T=x \sinh t \tag{5.68}
\end{equation*}
(5.3) Consider transforming into a reference frame moving at a constant speed vv along the xx-axis.
(a) Using the transformation x=x^(')-vt,y=y^(')x=x^{\prime}-v t, y=y^{\prime}, z=z^('),t=t^(')z=z^{\prime}, t=t^{\prime}, show that the Minkowski metric line element becomes, in a moving-coordinate reference frame, ds^(2)=-dt^('2)(1-v^(2))+dx^('2)+dy^('2)+dz^('2)-2vdx^(')dt^(')\mathrm{d} s^{2}=-\mathrm{d} t^{\prime 2}\left(1-v^{2}\right)+\mathrm{d} x^{\prime 2}+\mathrm{d} y^{\prime 2}+\mathrm{d} z^{\prime 2}-2 v \mathrm{~d} x^{\prime} \mathrm{d} t^{\prime}. (5.69)
(b) What is the light cone structure of this metric? (c) What is the proper time interval measured along an observer's world line?
(d) Writing the line element as a matrix with components g_(mu nu)g_{\mu \nu}, compute the inverse matrix with components g^(mu nu)g^{\mu \nu}.
(5.4) Consider a metric line element
where kk is a constant.
(a) What is the proper length between events that occur at (r,theta,phi)(r, \theta, \phi) and (r+dr,theta,phi)(r+\mathrm{d} r, \theta, \phi) ?
(b) A light pulse is sent from (t_(em),chi,0,0)\left(t_{\mathrm{em}}, \chi, 0,0\right) to (t_(ob),0,0,0)\left(t_{\mathrm{ob}}, 0,0,0\right). Show that the photon's path can be described by
(a) What is the proper length interval between events at x=x_(a)x=x_{a} and x_(a)+dxx_{a}+\mathrm{d} x ?
(b) What is the proper time interval between events at t_(b)t_{b} and t_(b)+dt_(b)t_{b}+\mathrm{d} t_{b} ?
Part II
Curvature and general relativity
In this part of the book, we introduce the tools needed to understand the curvature of spacetime and its relationship to matter, culminating in the Einstein field equation.
In Chapter 6, we introduce some of the key ideas behind general relativity, and in particular the equivalence principle.
In Chapter 7, we describe the notion of what it means for vectors to be parallel in curved spaces. We introduce connection coefficients which allow us to take derivatives of vectors in curved space.
In Chapters 8 and 9 , we investigate geodesics: the paths that particles fall along in spacetime.
Although we study curved space, we make physical observations locally, in the flat space of our experience. The method to translate between different frames of reference is described in Chapter 10.
In Chapter 11, we explain how curvature of spaces are described using the Riemann tensor and the Ricci tensor. These will supply the left-hand side of the Einstein equation.
The right-hand side of Einstein's equation is supplied by the energy-momentum tensor, discussed in Chapter 12.
In Chapter 13, we write down the Einstein field equation, which is the foundation of general relativity.
In Chapter 14, we review some of the successes of general relativity that follow from the formalism described in this part of the book. Many of the topics described in this chapter will then be unpacked in more detail in the rest of the book.
6
Finding a theory of gravitation
6.1 Free fall and the equivalence principle
6.2 Why general relativity? 73
6.3 A differential equation to describe gravity quad75\quad 75
6.4 Local flatness 76
6.5 Time dilation in a gravitational field
Chapter summary 79
Exercises 79
A little reflection will show that the law of the equality of the inertial and gravitational mass is equivalent to the assertion that the acceleration imparted to a body by a gravitational field is independent of the nature of the body... It is only when there is numerical equality between the inertial and gravitational mass that the acceleration is independent of the nature of the body.
Albert Einstein
Newtonian gravitation acts instantaneously across the Universe and therefore picks out a unique time for all observers when the gravitational interaction occurs. This is inconsistent with relativity and its treatment of simultaneity. We must therefore look beyond Newton's theory for a complete description of gravitation. We begin our search with two claims. (1) A person falling under gravity can't feel their own weight. This is a statement of the principle of equivalence. (2) The metric alone describes the role of spacetime in the laws of physics. This is an expression of general covariance. Taken together, these two ideas, which turn out to be closely linked, will guide us towards a theory of gravitation.
6.1 Free fall and the equivalence principle
Central to general relativity is the notion that the inertial mass m_(i)m_{\mathrm{i}} of a particle and its gravitational mass m_(g)m_{\mathrm{g}} are identical. If we write an equation of motion for a particle in a gravitational field as
then, taking m_(i)=m_(g)m_{\mathrm{i}}=m_{\mathrm{g}}, the equation of motion tells us that the gravitational field vec(g)( vec(x))\vec{g}(\vec{x}) acting on a particle is equal to the acceleration vec(x)\vec{x} of the coordinates of the particle. Einstein's great insight was to grasp that gravity and an accelerating coordinate system are actually the same thing. A freely falling ^(1){ }^{1} astronaut, sees, by definition, no change in her coordinates in her local rest frame and so concludes that there is no gravitational field. To put it another way, a freely falling observer cannot feel their own weight.
The weak principle of equivalence ^(2){ }^{2} is a statement that gravitational and inertial mass are identical. This implies that it is possible to
choose a coordinate system in which the laws of motion of a freely falling particle take the same form as in unaccelerated Cartesian coordinates in the absence of gravitation. An observer OO on Earth and her freely falling astronaut friend O^(')O^{\prime} detect no difference in the laws of mechanics, except that OO observes the effect of, and herself feels, a gravitational field, while freely falling O^(')O^{\prime} does not. ^(3){ }^{3}
Example 6.1
Consider a cloud of NN test particles, ^(4){ }^{4} each with mass mm, subject to a uniform gravitational field vec(g)\vec{g}. Let's fix our attention on one particular particle, which will have an equation of motion of the form
which is to say that it feels the force of gravity and the non-gravitational interaction forces from the N-1N-1 other particles. This equation of motion describes the dynamics within the coordinate system of a particular observer OO. Next we make the coordinate transformation to another frame of reference that is uniformly accelerating with acceleration vec(g)\vec{g} in the -x-x-direction. The coordinate transformation is
which looks like the equation of motion for the particle in the absence of the gravitational field vec(g)\vec{g}.
Example 6.2
The principle of equivalence can also be illustrated by considering the acceleration measuring device in Fig. 6.1. It consists of a mass in a box suspended by springs. If the box is carried by an observer then it provides a measure of acceleration: an observer accelerating in the positive xx-direction, will see the mass displaced in the negative xx-direction. The mass will be also affected by the presence of a gravitational field, which also causes a displacement, but there is no way to distinguish this from the effect of an acceleration.
We can strengthen the equivalence principle to generalize it beyond the realm of mechanics. One form of the strong principle of equivalence says that it is possible to find a frame of reference where all non-gravitational laws of physics take on their special relativistic forms. In more detail:
The strong principle of equivalence: at every spacetime point in an arbitrary gravitational field it is possible to choose a local coordinate system such that, within a sufficiently small region of the point in question, all of the laws of nature take the same form as in unaccelerated Cartesian coordinate systems in the absence of gravitation. ^(3){ }^{3} There are some caveats. One is that no difference is detected over a small region of space and time. The size of the region is described below. ^(4){ }^{4} In this chapter, test particles have a low mass compared to the source of the gravitational field. As a result, we neglect any gravitational interaction between them
Fig. 6.1 A machine to measure acceleration consisting of a mass suspended in a box by light springs. ^(5){ }^{5} Tidal forces arise due to the difference in gravitational field strength across a body. The gravitational field due to the Sun and Moon varies across the Earth and this results in a tidal force on our planet and its oceans. The tides (the rise and fall of sea level due to the motion of the Moon and the Sun) originate due to the greater responsiveness to tidal forces of fluid water (oceans) than solid rock (the Earth). Even the solid Earth responds a litEven the solid Earth responds a lit-
the bit to tidal forces (the so-called tle bit to tidal forces (the so-called
'Earth tide') and this is responsible 'Earth tide') and this is responsible
for the Large Electron-Positron Collider at CERN stretching a few millimetres from its circumference of about 27 km as the Earth stretches, something the CERN scientists have to correct for.
Fig. 6.2 (a) In a uniform field, two test particles released along parallel paths at different positions will both accelerate in the same direction. A coordinate ate in the same direction. A coordinate
transformation can be used to remove transformation can be used to remove
this acceleration. (b) A real gravitational field, such as that generated by a planet, is never uniform and so test particles initially released along parallel paths will approach each other. The apparent force causing acceleration between them is called a tidal force.
Example 6.3
Special relativity tells us that a particle at rest in an inertial frame moves along the time axis. The strong principle of equivalence tells us that the same must be true in general relativity, so that free-falling particles follow a curve whose tangent vector is always timelike (known as a timelike curve). Such timelike curves are called geodesics. The equivalence principle has ended up being surprisingly powerful because it has allowed us to take a result from special relativity (in which gravity is completely absent) and deduce from it an important result in general relativity (which includes gravity): free-falling particles follow timelike curves!
Let's emphasize an important idea: at the particular point where the observer is localized, there is no way to distinguish between a gravitational field and acceleration. Therefore, we can transform away the apparent effect of acceleration to give the physics expected from special relativity in the absence of gravity. As a result of the strong principle of equivalence, no experiment can distinguish between a homogeneous gravitational field and an accelerating reference frame. That is, if all points experience a uniform gravitational field (as they did in Example 6.1), then we can find a coordinate system where we transform away the equivalent effects of the gravitational field and acceleration for all points, so that it appears that gravity is not acting. In less technical language: a uniform gravitational field is equivalent to there being no gravitational field. This would seem to make hunting for the effects of gravitation a hopeless endeavour, since we could never be sure that an apparent gravitational effect wasn't simply the effect of an accelerating coordinate system. All is not lost, however, because real gravitational fields are never homogeneous! Over sufficiently large distance, a real gravitational field can be distinguished from an accelerating reference frame. Distinguishing them can be done by noticing the presence of tidal forces, ^(5){ }^{5} as shown in the following example.
Example 6.4
An observer awakes in a spaceship feeling the apparent effect of gravity holding them in bed. Is this due to the acceleration of the ship or to the gravitational field of a nearby planet? A planet's field, assumed spherically symmetric, is not homogeneous, so can, in principle, be detected. The observer releases two test particles, initially moving parallel to each other (Fig. 6.2). If the particles start to move towards each other (or away from each other) we attribute this to a tidal force. The presence of a tidal force tells the observer that (s)he is not simply in an accelerating frame and therefore gravity must be acting.
Our investigation of the equivalence principle allows us to learn a number of lessons that will be very important in the rest of this book as we describe general relativity.
Lesson 1: A freely falling observer does not feel any force and so can't tell if a gravitational field is present. They fall along a timelike curve called a geodesic. ^(6){ }^{6}
Lesson 2: Locally, the freely falling observer can set up a laboratory covered by the unaccelerated coordinate system familiar from special relativity. This is an example of a local inertial frame (LIF). ^(7){ }^{7} All laws of physics described in a LIF are those from special relativity.
Lesson 3: If measurements are made over a sufficiently small time frame and length scale, the observer can never detect the effect of gravitation. However, a real gravitational field will be inhomogeneous, so an observer can detect the effects of gravitation by making measurements at different points in spacetime that show the effects of tidal forces. ^(8){ }^{8}
Let's now get a quick physics payoff from the equivalence principle and show that light is bent by a gravitational field.
Example 6.5
Consider the experiment shown in Fig. 6.3(a) in which a rocket accelerates upwards with an acceleration gg. A particle passes through the windows of the rocket and travels through the interior of the rocket, but by the time it has passed through the width of the rocket, the rocket has moved upwards and so it ends up landing on the rocket wall at a position which is lower than the point at which it entered. Figure 6.3(b) illustrates that from the point of view of an astronaut inside the rocket the photon follows a curved trajectory. ^(9){ }^{9} The parabolic path would follow the equations x=x_(0)+vtx=x_{0}+v t, where vv is the velocity of the particle, and y=y_(0)-(1)/(2)gt^(2)y=y_{0}-\frac{1}{2} g t^{2}, where xx and yy are the coordinates [horizontal and vertical respectively, in Fig. 6.3(b)] measured in the rocket frame and (x_(0),y_(0))\left(x_{0}, y_{0}\right) are the coordinates of the point where the particle enters the rocket at t=0t=0. Thus, y=y_(0)-gx^(2)//(2v^(2))y=y_{0}-g x^{2} /\left(2 v^{2}\right).
However, so far, this analysis has shown nothing remarkable. But now we can deploy the equivalence principle, which implies that the astronaut cannot tell whether her rocket is accelerating upwards, with an acceleration of gg, or simply that she is experiencing a gravitational field equal to gg. So it is possible that her rocket is parked on a planet with a gravitational field of gg and the bending of the particle trajectory would still occur. We can make the experiment particularly vivid by imagining that the particle is a photon, so that it would suggest that light is bent by gravitational fields! Our analysis shows that, for our rocket problem, the light beam changes from purely horizontal to travelling at an angle of ~~|dy//dx|=gx//c^(2)\approx|\mathrm{d} y / \mathrm{d} x|=g x / c^{2} to the horizontal.
This argument is only part of the story however. It is quantitatively correct for a particle beam travelling at a non-relativistic velocity, but it turns out that the for a particle beam travelling at a non-relativistic velocity, but it turns out that the
bending effect for photons in the field of a spherical mass is actually three times larger than suggested by this analysis. The problem for the photon case (or any particle travelling at relativistic speeds) is in the incorrect assumption that our experiment is carried out over a sufficiently small distance to allow our straightforward application of the equivalence principle. However, the intuition that light should bend is correct, even if the size of the bending is not, and our argument does account for one third of the total deflection effect ^(6){ }^{6} We can contrast the difference in philosophy between the Newtonian and Einsteinian views of gravitation. In Newtonian gravitation, the Sun exerts a force on the Earth; in Einsteinian gravitation, the Earth feels no force and simply falls freely along a geodesic which is a path that orbits the Sun. ^(7){ }^{7} If orthonormal basis vectors are used to describe a LIF, as in the case of the usual conventions of special relativity, the frame is sometimes called a local Lorentz frame. ^(8){ }^{8} This notion of 'sufficiently small' provides the caveat for the earlier sidenote. ^(9){ }^{9} This is nothing particularly to do with special relativity (the rocket is only starting to accelerate and so its speed is much less than cc ) and the same effect would be seen with an accelerating car in vertical rain (with the diagram rotated). ^(10){ }^{10} This was the effect that was measured very early in the history of relativity and gave an initial vindication of Einstein's ideas. ↷\curvearrowright Section 24.1 of Chapter 24 presents the calculation done properly.
Fig. 6.3 (a) Time snapshots in an inertial frame in which a rocket accelerates (upwards in this diagram). A particle passes through the windows of the rocket in a direction perpendicular to the direction of acceleration of the rocket. (b) In the rocket's frame, the path of the particle follows a parabolic trajectory.
We'll see later how gravitation follows from the curvature of spacetime encoded in the components of the metric tensor. This curvature can affect intervals in space and also in time; the argument presented above only takes the time part into account. However, the measurement of the effect for (highly relativistic) photons in a gravitational field from a spherically symmetric mass requires us to consider the space part too, since the deflection from this enters at the same order as that of the time part, owing to the photon exploring a spatial distance large enough to experience the spatial contribution to the curvature. The point is that this measurement is made over a distance where the curvature of space can be discerned, and is therefore not local enough to allow the straightforward application of the equivalence principle. If you treat the spatial part of the curvature as well, you recover the factor of three missing from the present analysis.
In spite of these difficulties, let's now use our result to have a first, hand-waving attempt at the famous problem of the bending of starlight around the Sun, visible during a total solar eclipse. ^(10){ }^{10} We will treat the problem properly later, but we will here make a crude estimate where we discard numerical factors with wild abandon. Starlight passing close to the surface of the Sun will experience a gravitational field equal to g=GM_(o.)//R_(o.)^(2)g=G M_{\odot} / R_{\odot}^{2}. This will be when the most bending will occur, but there will also be bending when the light is further away. Crudely, we can say that the starlight will be bent over a distance which must scale as R_(o.)R_{\odot} and the gravitational field that it will experience will be of order g=GM_(o.)//R_(o.)^(2)g=G M_{\odot} / R_{\odot}^{2}. Therefore, the angle of deflection theta\theta (using our previous result that light is bent by an angle given very roughly by |dy//dx|=gx//c^(2)|\mathrm{d} y / \mathrm{d} x|=g x / c^{2} ) will be
Remarkably, this turns out to be the correct answer apart from a factor of 4 .
We can now attempt to find gravitational fields by transforming away all effects of uniform acceleration from our coordinate systems and conclude that whatever is left over at the end must be gravity. However, we still lack a clear guide to help us formulate a theory of relativistic gravitation. Before moving on, we present a brief historical interlude. Einstein's own motivation in invoking the principle of equivalence is often explained in terms of the influence of Mach's principle on his thinking, as discussed in the next example.
Example 6.6
Newton's laws appear to have restricted applicability: they apply in inertial frames In non-inertial frames, new inertial forces appear. ^(11){ }^{11} These present a philosophical difficulty if we accept the principle of relativity, since they imply that mysterious new forces appear in non-inertial frames, but it is not immediately clear if these are exerted by space itself or by other bodies.
How do we define an inertial or non-inertial frame if there is no absolute space against which to judge whether the frame is accelerating or not? Ernst Mach ^(12){ }^{12} came against which to judge whether the frame is accelerating or not? Ernst Mach ^(12){ }^{12} came up with a solution that was a great influence on Einstein. Mach says we can judge a motion of all of the matter in the Universe. We say an inertial frame is unaccelerated with respect to fixed stars. ^(13){ }^{13} This is helpful in that it allows us to use relativity to propose that inertial forces arise because of the acceleration of a mass relative to this fixed frame or, equivalently, the acceleration of the fixed frame with respect to the mass. This saves Newton's laws: they apply in all frames of reference with the extra non-inertial forces being real, physical forces that arise from the motion of the stars. We can then attempt to formulate a theory of inertial forces as arising from the interaction of inertial masses. By analogy with electromagnetism this would have a static contribution like Coulomb's law that varies according to alpham_(i)m_(j)//r^(2)\alpha m_{i} m_{j} / r^{2}, with alpha\alpha a constant. It might also have more complicated parts that contribute. For example, when masses are accelerating relative to each other ^(14){ }^{14} we would expect a force of the form betam_(1)m_(2)//r\beta m_{1} m_{2} / r, with beta\beta a constant.
Now, if we accept the principle of equivalence, then (the static part of) the inertial interaction can be directly identified with the gravitational force. The equivalence principle is then a principle of the equivalence of the gravitational force and the static part of the inertial force. We would then conclude that the relative acceleration of masses gives rise to a 1//r1 / r contribution to the force. This does not exhaust the possible contributions to the inertial force, however, and so we should expect further contributions to the inertial part of the force law.
So why not continue along this line and describe gravitation by considering the interactions between particles in this way? The reason is that we are faced with a very complicated nonlinear problem. This is because mass and energy are interchangeable, and mass-energy is the source of gravitation. For example, when two particles act as a source of gravitation then to correctly evaluate the total size of the source of the gravitational effect, we must compute not only their individual gravitational potential energies but also the interaction potential energy between them. Each of these makes a contribution to the gravitational force. Owing to this complication of nonlinearity, we must abandon the discussion in terms of the interaction of individual masses. Fields then become important as they simplify our task. They allow us to avoid the question of how particles directly influence each other. Instead, a particle contributes to, and is acted on by, a field that is defined locally. In the field viewpoint, we therefore restrict our view to a local point in spacetime, and compute the effect of gravitation at the point of interest. ^(15){ }^{15}
6.2 Why general relativity?
Einstein's theory of gravitation is called general relativity because it is founded on the principle of general covariance. ^(16){ }^{16}
The principle of general covariance: the laws of physics must be invariant under all coordinate transformations, so the laws must hold in different coordinate systems.
This principle is, in a way, a restatement of the equivalence principle, but framing it this way is very helpful because it puts constraints on the ^(11)An{ }^{11} \mathrm{An} example is the centrifugal force experienced in a rotating frame. See Chapter 8 for a discussion of inertial forces. ^(12){ }^{12} Ernst W. J. W. Mach (1838-1916) ^(13){ }^{13} ' When the subway jerks, it's the fixed stars that throw you down.' Ernst Mach, attributed by Philipp Frank (1884-1966). ^(14){ }^{14} An accelerating charge in electromagnetism exerts a force that varies as 1//r1 / r. This is closely related to the interaction that gives rise to the emission of electromagnetic radiation from an accelerating charge. ^(15){ }^{15} Even using fields, nonlinearity continues to make the problem a complicated one. This can be made clearer by considering the contrasting case of electromagnetism. In electromagnetism, charges are the sources of electromagnetic fields and ds and these fields add linearly. The charges determine the fields and the field determines the motion of the charges. An electromagnetic wave passes though another electromagnetic wave without scattering precisely because the theory is linear: the (electrically neutral) waves are not sources of electric field. This is not the case for gravitation. For example, gravitational waves interact with other gravitational waves since a gravitational wave is also a source of gravitation. ^(16){ }^{16} As pointed out by several authors, 'general relativity' is something of a misnomer: it doesn't describe a form of relativity that is more general than special relativity. ^(17){ }^{17} The law
is a relationship between two geometrical quantities (in this case vectors) and no preferred basis is singled out. On the other hand, if the equation were something like
we would smell a rat, not only because it's not dimensionally correct but because it somehow makes the yy-direction special and so it wouldn't transform sensibly. ^(8){ }^{8} Indeed, we could attempt to regard this equation as a component of a more general tensor equation. As we will discover later, the left-hand side of the correct equation will turn out to be something to do with spacetime derivatives (but we will need a more sophisticated notion of curvature to get this right) and the right-hand source term is indeed something to do with the mass density, but in fact we need the energymomentum tensor introduced in Section 4.5 . ^(19)A{ }^{19} \mathrm{~A} valid tensor equation can often be identified as having a component version whose indices match on both sides of the equation. Such an equation is sometimes called manifestly covariant. ^(20){ }^{20} By diag (-1,1,1,1)(-1,1,1,1) we mean a 4xx4 \times 4 matrix with these diagonal elements and all other elements equal to zero.
sorts of equations we can write down that will be physically valid. For a start, we can't have any preferred bases in our theory (any more than in Newtonian mechanics ^(17){ }^{17} could claim that the yy-axis was particularly special). This means that we can't write down a law which singles out some specific components of a vector or a tensor, for the simple reason that this would look very different in different frames of reference.
Example 6.7
Failed theory of gravitation: Based on what we have learned in the book so far, here is a misguided attempt at a theory of gravitation and then an explanation of why it won't work. We know Newton's approach to gravitation ended up with grad^(2)Phi=4pi G rho\nabla^{2} \Phi=4 \pi G \rho (eqn 15 of Chapter 0), so let's try and fix it up by using the 4 -vector generalization of vec(grad)^(2)\vec{\nabla}^{2} which is del^(2)=del_(mu)del^(mu)=-del_(t)^(2)+ vec(grad)^(2)\partial^{2}=\partial_{\mu} \partial^{\mu}=-\partial_{t}^{2}+\vec{\nabla}^{2}. So our patched up theory of gravitation could be written down as
{:(6.6)del^(2)Phi=-(1)/(c^(2))(del^(2)Phi)/(delt^(2))+ vec(grad)^(2)Phi=4pi G rho:}\begin{equation*}
\partial^{2} \Phi=-\frac{1}{c^{2}} \frac{\partial^{2} \Phi}{\partial t^{2}}+\vec{\nabla}^{2} \Phi=4 \pi G \rho \tag{6.6}
\end{equation*}
This equation looks like a worthy guess, but it fails at the first hurdle - general covariance! It doesn't work in different frames of reference. The source term rho\rho is not a scalar and will transform when you go into different inertial frames. So this equation will not do, but it is heading in the right direction. ^(18){ }^{18}
The equivalence principle has told us that any physical law that can be expressed in special relativity without gravity will hold in a LIF, even when gravity is present. General covariance then tells us that the law holds in the same form in different coordinate system. We choose to express our laws using tensors and so a valid tensor equation ^(19){ }^{19} compatible with special relativity also holds in a LIF even when gravity is present, and this same equation should take the same form in any coordinate system in the presence of gravity. In short: a valid tensor equation in the absence of gravity is a valid tensor equation in the presence of gravity. But something must change in the mathematics to herald the presence of gravitation! That something is geometry. The geometry of spacetime is altered by the presence of gravity, causing it to become curved. This is expressed in terms of the metric tensor. In the LIF of special relativity, the metric tensor, which can be thought of as a box of clocks and rulers that tells us how to measure vectors, is given by the Minkowski metric tensor eta\boldsymbol{\eta}, whose components are^(20)diag(-1,1,1,1)\operatorname{are}^{20} \operatorname{diag}(-1,1,1,1). However, in a general frame of reference, the metric is written as a tensor g\boldsymbol{g}, whose components (i) are different to those of eta\boldsymbol{\eta} and (ii) will vary in space and time. This provides our next lesson.
Lesson 4: The effect of gravitation on our tensor equations describing physical laws is to change the Minkowski metric eta\boldsymbol{\eta} to a new metric g\boldsymbol{g}.
Our strategy to find physical theories in the presence of gravitation will be to take a valid tensor equation that works in special relativity and upgrade it, simply by changing the Minkowski metric eta\boldsymbol{\eta} to the general
metric tensor g\boldsymbol{g}. The next step in our search for a theory of gravity is therefore to seek a theory that determines this metric g\boldsymbol{g} in the presence of gravitation and also obeys the principles of equivalence and of general covariance.
6.3 A differential equation to describe
gravity
The metric is a rulebook that allows us access to the distances and angles between points in spacetime. For examples, the interval in spacetime along a curve x^(mu)(lambda)x^{\mu}(\lambda) from lambda=a\lambda=a to lambda=b\lambda=b may be worked out using the metric via the prescription
In the absence of a gravitational field, we might evaluate this path length in any number of coordinate systems, but always obtain the same answer to questions such as the ratio of a circle's circumference to its diameter being pi\pi, or the angles in a triangle adding up to pi\pi. The only way in which this is possible is if g\boldsymbol{g} at one point is related to g\boldsymbol{g} at another point. This implies that the tensor g\boldsymbol{g}, a function of position x,^(21)x,{ }^{21} should satisfy a differential equation. It is this differential equation, the equation that tells us how the metric g(x)\boldsymbol{g}(x) varies in spacetime, that allows us to separate the true effects of gravity, from those effects that result from a particular choice of coordinates. From this point of view, the metric is a field. We can think of a field ^(22){ }^{22} as a machine into which we input a position in spacetime x^(mu)=y^(mu)x^{\mu}=y^{\mu}. The field outputs the value of the tensor g(y)\boldsymbol{g}(y) appropriate for the point yy. The field g\boldsymbol{g} must obey a differential equation of motion that we'll call a field equation.
The equation in question arranges the components of g\boldsymbol{g} and their derivatives into a new tensor that describes the curvature of spacetime ^(23){ }^{23} called the Riemann tensor R\boldsymbol{R}. It is curvature that is the true effect of gravitating mass and which can never simply be the results of having chosen a perverse set of coordinates. It will transpire that non-zero components of R\boldsymbol{R} tell us about curvature, and curvature means gravitation.
Despite these pointers, we still have relatively little guidance on how to put together the field theory of gravitation. However, there is one other clue: the theory must be compatible with (i) Newtonian gravitation and (ii) with special relativity. That is to say that, in the limit of weak gravitational fields and low velocities, the predictions of the field theory must recreate those of Newton's universal theory of gravitation. In the limit of vanishing gravitational field, the theory must agree with special relativity. Conceptually, therefore, the field theory of gravity (that is, general relativity) fits into the scheme shown in Fig. 6.4. We can summarize this section as follows:
Fig. 6.4 The relationship between general relativity, special relativity, Newtonian gravity and Newtonian mechanics as a function of velocity vv and gravitational constant GG. ^(24){ }^{24} The observer will need to set up their local frame to have the Lorentz siglocal frame to have the Lorentz sig-
nature [i.e. the (-1,1,1,1)(-1,1,1,1) pattern of nature [i.e. the (-1,1,1,1)(-1,1,1,1) pattern of
signs on the diagonal of the Minkowski metric.] ^(25){ }^{25} There is a distinction between a frame of reference and a set of coordinates. A frame of reference is defined by some basis vectors and so has an existence independent of a coordinate system.
A general LIF will be covered in coordinates for which g_(mu nu)(x=x^(alpha)(P))=g_{\mu \nu}\left(x=x^{\alpha}(\mathcal{P})\right)=eta_(mu nu)\eta_{\mu \nu} and delg_(mu nu)// delx^(alpha)|_(x=x^(alpha)(P))=0\partial g_{\mu \nu} /\left.\partial x^{\alpha}\right|_{x=x^{\alpha}(\mathcal{P})}=0 in an infinitesimal region around the point (t(P),x(P),y(P),z(P))(t(\mathcal{P}), x(\mathcal{P}), y(\mathcal{P}), z(\mathcal{P})). This can be achieved using Riemann normal coordinates, described in Chapter 35. - The freely falling frame has its time direction e_(t)e_{t} tangent to a geodesic. It direction e_(t)e_{t} tangent to a geodesic. It
remains freely falling as a function of remains freely falling as a function of
the local (proper) time so continues to the local (proper) time so continues to
be a LIF for the whole time it is falling. be a LIF for the whole time it is falling.
In terms of coordinates, it is only flat in In terms of coordinates, it is only flat in
a very small spatial region around the a very small spatial region around the
origin of the frame, but for long intervals of proper time.
Lesson 5: General relativity provides a field theory of the metric field g(x)\boldsymbol{g}(x) which encodes gravitation through its effect in providing a curvature to spacetime.
6.4 Local flatness
The Minkowski metric can be used by any observer who is not subject to a gravitational field. Such a region of spacetime has no curvature and so Minkowski spacetime is flat. By the principle of equivalence, a freely falling observer does not feel a gravitational field providing they make measurements over a small enough region of spacetime. This implies that locally, all observers can treat spacetime as flat if they use a LIF. This is the content of the local flatness theorem.
It is always possible to reduce a metric field g(x)\boldsymbol{g}(x), evaluated at a single point x=Px=\mathcal{P}, to the Minkowski metric eta\boldsymbol{\eta}. That is, we can introduce coordinates x^(alpha)(P)x^{\alpha}(\mathcal{P}) such that the components of the tensors obey the equation
This is possible since g\boldsymbol{g} is represented by a symmetric 4xx44 \times 4 matrix which can always be diagonalized. This implies that there are potentially lots of local frames (i.e. not just the special freely falling LIF we have described so far) that appear flat at a single point and, in these frames, the observer uses the Minkowski metric of flat spacetime to manipulate vectors. ^(24){ }^{24} The point of the local flatness theorem is that it is also possible to find coordinates such that the derivatives of the metric vanish at this point
This means that spacetime will be described by the Minkowski metric in an infinitesimal region around x=x^(alpha)(P)x=x^{\alpha}(\mathcal{P}), making the notion of local flatness mathematically respectable. A local inertial frame (LIF) is a frame where this requirement is satisfied at some point x=Px=\mathcal{P}. Note that in the presence of gravity it is not possible to find a vanishing second derivative of g\boldsymbol{g}, since second derivatives will turn out to be related to spacetime curvature. ^(25){ }^{25}
The freely falling frame of reference (used by the falling observer) that we have described in this chapter is an important example of a LIF. LIFs are very useful in that (i) as we've said, the laws of physics are identical in LIFs and in general frames, subject to the change eta rarr g\boldsymbol{\eta} \rightarrow \boldsymbol{g}; and (ii) analysing physics in LIFs is invariably far easier than doing so in curved spacetime.
We shall also use the idea that we can straightforwardly identify frames in which g=eta\boldsymbol{g}=\boldsymbol{\eta} at the origin. By design, an observer erects an orthogonal set of basis vectors where they are situated, and normalizes this basis. The reason we're interested in these local orthonormal frames is that observations and measurements can be thought of as being made in them. This provides the final lesson in this chapter:
Lesson 6: An observation is made and interpreted by an observer in a local orthonormal frame, who uses the Minkowski tensor at the point they inhabit in spacetime.
The picture to have in mind is of the observer in a laboratory using their set of orthonormal axes, constructed from short, rigid rods, as an instrument to interpret the components of vectors locally.
6.5 Time dilation in a gravitational field
Our discussion so far has been very general and, consequently, a little abstract. We have yet to see how the curvature of spacetime that encodes gravity via the metric g\boldsymbol{g} has any effect beyond the possibility of Newtonian-style gravitational attraction. To give an idea of how g\boldsymbol{g} affects measurements we shall conclude this chapter by illustrating the influence of the metric on measurements made by two distant observers in an inhomogeneous gravitational field. Here gravitation leads to time dilation and a gravitational shift of the frequencies of light signals.
Example 6.8
Consider a clock at rest in some coordinate system. The clock ticks (which we will take to be infinitesimally separated by a coordinate interval dt\mathrm{d} t ) will then be separated by the proper time interval dtau\mathrm{d} \tau where
where we have upgraded the flat metric eta_(alpha beta)\eta_{\alpha \beta} to the curved-space metric g_(alpha beta)g_{\alpha \beta}. If the clock were sitting 'at infinity', well away from any sources of gravitational field, it would indeed be in flat space, so g_(00)=eta_(00)=-1g_{00}=\eta_{00}=-1 and dt=dtau\mathrm{d} t=\mathrm{d} \tau. However, if the clock was a distance rr from a star of mass MM then we could use the metric of eqn 5.22 (assuming the Newtonian limit holds) and hence g_(00)=-(1-2GM//r)g_{00}=-(1-2 G M / r). In this case, ^(26){ }^{26}
{:(6.11)dtau=(-g_(00))^(1//2)dt=(1-(2GM)/(r))^(1//2)dt:}\begin{equation*}
\mathrm{d} \tau=\left(-g_{00}\right)^{1 / 2} \mathrm{~d} t=\left(1-\frac{2 G M}{r}\right)^{1 / 2} \mathrm{~d} t \tag{6.11}
\end{equation*}
That is say that the interval dtau\mathrm{d} \tau between ticks of a clock measured in a particular frame depends on the details of the metric and hence, on the gravitational field. Since (1-2GM//r)^(1//2) < 1(1-2 G M / r)^{1 / 2}<1, we have dtau < dt\mathrm{d} \tau<\mathrm{d} t and so this is gravitational time dilation. Note that the factor 2GM//r2 G M / r that enters this expression is just the square of the classical escape velocity v_("esc ")v_{\text {esc }} at distance rr from the star, which appears when you equate the kinetic energy (1)/(2)mv_("esc ")^(2)\frac{1}{2} m v_{\text {esc }}^{2} to the gravitational potential energy GMm//rG M m / r for a test mass mm. Thus we could write eqn 6.11 as dtau=dtsqrt(1-(v_("esc ")//c)^(2))\mathrm{d} \tau=\mathrm{d} t \sqrt{1-\left(v_{\text {esc }} / c\right)^{2}}. If instead we wish to write this expression in terms of the Schwarzschild radius ^(27)r_(S)=2GM//c^(2){ }^{27} r_{\mathrm{S}}=2 G M / c^{2}, then we could write it as
{:(6.12)dtau=dtsqrt(1-(r_(S))/(r)):}\begin{equation*}
\mathrm{d} \tau=\mathrm{d} t \sqrt{1-\frac{r_{\mathrm{S}}}{r}} \tag{6.12}
\end{equation*}
^(26){ }^{26} Remember that we are using units for which c=1c=1. The gravitational timedilation factor is [1-2GM//(c^(2)r)]^(1//2)\left[1-2 G M /\left(c^{2} r\right)\right]^{1 / 2} if you put the factors of cc back in. ^(27){ }^{27} This quantity will be discussed in detail in Part IV of the book.
Fig. 6.5 Identical clocks 1 and 2 are held fixed, a long way from the star and at distance rr respectively. (a) A third clock is released when it is next to clock 1. (b) Clock 3 is travelling at speed vv by the time it ends up next to clock 2. The diagram is schematic, so all three clocks and the star should be in a straight line (and of course the star will be much bigger!).
Let's derive eqn 6.11 a different way. Consider two identical clocks, the first held well away from the star and a second at a distance rr from it (see Fig. 6.5). Take a third identical clock, to measure the first two, and release it at the position of clock 1 [see Fig. 6.5(a)]. Clock 3 is in free-fall in the gravitational field of the star, and so the interval between its clicks can be taken to be dtau\mathrm{d} \tau in its LIF. Immediately after releasing clock 3 , we find that clock 1 and clock 3 agree (their ticks are in sync) because clock 3 is initially barely moving and so no relativistic corrections are needed (just as we found above: dt=dtau\mathrm{d} t=\mathrm{d} \tau ). However, clock 3 starts to accelerate towards the star as it drawn inexorably towards it and by the time it reaches clock 2 it will be moving much faster, let's say at a speed vv [see Fig. 6.5(b)]. We are in the Newtonian limit so its kinetic energy (1)/(2)mv^(2)\frac{1}{2} m v^{2} has been obtained from releasing gravitational potential energy GMm//rG M m / r, implying that v^(2)=2GM//rv^{2}=2 G M / r. Because clock 3 is instantaneously in free fall, gravity is absent in its inertial reference frame and so it has the same time interval between clicks dtau\mathrm{d} \tau as it had previously. However, the interval between ticks for clock 2 will be dt=gammadtau\mathrm{d} t=\gamma \mathrm{d} \tau where gamma=(1-v^(2))^(-1//2)\gamma=\left(1-v^{2}\right)^{-1 / 2}, so that once again
{:(6.13)dtau=(1-(2GM)/(r))^(1//2)dt:}\begin{equation*}
\mathrm{d} \tau=\left(1-\frac{2 G M}{r}\right)^{1 / 2} \mathrm{~d} t \tag{6.13}
\end{equation*}
We have centred our discussion around clocks, but we could have framed the argument around atoms emitting light of a well-defined frequency due to some atomic transition. In this case, the frequency of the detected light from an atom in a gravitational field is found to be lower than that from the process occurring at infinity. In terms of wavelength, the light has been shifted towards the red end of the spectrum and, as a result, we call the effect gravitational redshift.
Bear in mind though that the calculation in this example was carried out in the Newtonian limit, meaning that GM//Rc^(2)≪1G M / R c^{2} \ll 1, and so one needs to check that this limit holds before using eqn 6.11 to perform a calculation. However, if this limit does hold we can simplify eqn 6.11 and write the time dilation factor as 1-GM//R1-G M / R using the binomial theorem to expand the square root.
In Exercise 6.1, you can put numbers into these formulae, but suffice to say that the effect of gravitational redshift between an observer on the Earth's surface compared to one in deep space is negligible. For an observer on the surface of a neutron star (whose radius might only be 10 km , but the mass could be something like 1.4M_(o.)1.4 M_{\odot} ) the effect is extremely significant. However, even though the effect on Earth is extremely tiny, it is needed to take into account for the proper working of the satellite navigation methods based on the Global Positioning System (GPS). This relies on accurate timing of signals coming from a network of satellites and received by an observer who wants to know where on the Earth's surface she is. Relativistic effects need to be taken into account for this to be accurate, first because the satellites are in motion with respect to the ground based observer (special relativity correc tion) and second because the satellites experience a lower gravitational field than the ground-based observer (general relativity correction due to the gravitational redshift).
This is the second form of redshift we have encountered. The first (seen in Exercise 4.8) was due to the Doppler effect in flat spacetime. We will encounter a third form of redshift when we discuss cosmology; that form results from the expansion of spacetime itself over very large distances. Gravitational and cosmological redshift are effects due to the change in metric in spacetime, which is different to the special relativistic Doppler effect which is due to the velocity of sources and observers.
In the next chapter, we will continue to explore the properties of curved spacetime and find out how to define a derivative of a vector in a generally covariant way. As we have learnt in this chapter, it is only generally covariant quantities that will be admissible in any theory that aspires to describe the physical universe.
Chapter summary
The principle of equivalence tells us that in every local inertial frame all non-gravitational laws of physics must take on their special relativistic forms.
The principle of general covariance tells us that laws must be preserved in different coordinate systems.
We have used these principles to describe time dilation in a gravitational field.
Exercises
(6.1) Estimate the gravitational redshift (the factor by which a clock in a gravitational field runs slow compared to one subject to zero gravitational field) for the following cases: (a) a clock on the surface of the Earth; (b) a clock on the surface of the Sun; (c) a clock on the surface of a solar mass white dwarf with radius 10^(3)km10^{3} \mathrm{~km}.
(6.2) A recent experiment uses clouds of ^(87)Sr{ }^{87} \mathrm{Sr} atoms at around 100 nK , loaded into an optical lattice and operated as a sophisticated atomic clock [T. Bothwell et al., Nature 602, 420 (2022)]. It is possible to measure the gravitational redshift across the millimetre scale of this system, and the laboratory experiment gives a value of the frequency gradient of around -1.0(2)xx10^(-19)mm^(-1)-1.0(2) \times 10^{-19} \mathrm{~mm}^{-1}. Is this consistent with what you would expect from general relativity?
(6.3) A satellite is in a circular orbit of radius rr around a planet of radius RR and mass mm. Show that a clock on the satellite runs faster than a clock on the surface of the planet, located at one of the poles, by a factor of approximately 1+GM//c^(2)[1//R-3//(2r)]1+G M / c^{2}[1 / R-3 /(2 r)]. Hence show that there is one possible orbit radius for which the two clocks run at the same rate. Hint: You not only need the gravitational time dilation but also the effect due to the satellite moving (i.e. the special relativity time dilation), which is (at least instantaneously) in a straight line.
Estimate the factor for a geostationary satellite orbiting around the Earth.
(6.4) Consider the Schwarzschild metric line element which describes the spacetime around spherically
symmetric stars (here we have taken G=c=1G=c=1 )
(a) What is the proper time interval, measured by an observer at rest, between events at coordinate time tt and t+dtt+\mathrm{d} t that both occur at a point (r,theta,phi)(r, \theta, \phi) ? (b) Now consider two observers at rest in this spacetime. An atom undergoes an atomic transition at position (r_(2),theta,phi)\left(r_{2}, \theta, \phi\right). What is the time interval between two successive wavefronts measured at point (r_(2),theta,phi)\left(r_{2}, \theta, \phi\right) ?
(c) What is the interval between wavefronts measured at r_(2)r_{2} from the experiment that takes place at r_(1)r_{1} ?
Fig. 6.6 Light signal send from AA to BB and back to AA (Exercises 6.5 and 6.6).
(6.5) Consider a measurement of length involving a light signal being sent from point AA to BB and then back to AA, as shown in Fig. 6.6. Multiplying cc by the time that the observer at AA measures for this process gives twice the distance between points.
(a) By considering the interval of coordinate time that elapses for a signal sent between AA and BB, show that we obtain dt=(1)/(g_(00)){-g_(0i)(d)x^(i)+-[(g_(0i)g_(0j)-g_(ij)g_(00))dx^(i)(d)x^(j)]^((1)/(2))}\mathrm{d} t=\frac{1}{g_{00}}\left\{-g_{0 i} \mathrm{~d} x^{i} \pm\left[\left(g_{0 i} g_{0 j}-g_{i j} g_{00}\right) \mathrm{d} x^{i} \mathrm{~d} x^{j}\right]^{\frac{1}{2}}\right\}.
(b) What do the two roots correspond to?
(c) Show that the corresponding proper time interval measured by the observer at AA for the signal to be sent and received back is
If the g_(ij)g_{i j} depend on x^(0)x^{0} so that the spatial components are time dependent, it would not make sense to integrate this expression to obtain a general expression for proper time, since the integral would depend on the world line between the two points in space.
(6.6) Consider again the set up in Exercise 6.5 with light signals sent between AA and BB and call the time on BB 's world line when the light signal is received x^(0)x^{0}. We define the time on AA 's world line that is simultaneous to this to be half way between emission and reception of the light signals.
(a) Show that this time is given by
Attempting to use this formula to synchronize clocks on a closed path, such as a rotating disc, will fail, since the integral will not vanish.
(b) Using the metric for the rotating reference frame from Exercise 3.5, show that the discrepancy over one circuit is
when Omega r≪1\Omega r \ll 1. To the same level of approximation, the discrepancy in proper time is Delta tau=sqrt(-g_(tt))Delta t~~\Delta \tau=\sqrt{-g_{t t}} \Delta t \approxDelta t\Delta t.
(c) By comparing the optical path lengths of two counter-propagating beams along a rotating circular fibre, show that the rotation causes a shift in their interference pattern of
where lambda\lambda is the wavelength of the light. This shift is known as the Sagnac effect after George Sagnac (1869-1928).
Parallel lines and the covariant derivative
We never remark any passion or principle in others, of which, in some degree or other, we may not find a parallel in ourselves.
David Hume (1711-1776) A Treatise of Human Nature
Comparisons are odorous
William Shakespeare (1564-1616)
Much Ado About Nothing III:5
7.1 Parallelism
In order to describe physical quantities in general relativity, we shall need to evaluate mathematical objects (functions, vector, and tensor fields and so forth) at particular points in spacetime. We shall also identify differential equations for these objects to understand how they change with position in spacetime. This requires the notion of a derivative. When spacetime is curved, some of our basic assumptions about vectors and their derivatives break down. This chapter is concerned with finding a method to take derivatives of vectors with respect to position in spacetime in cases where the spacetime is curved.
Example 7.1
Probably the most familiar example of a curved space is the one in which we live: the Earth's surface. In navigating around our home town, we might use a street map, the coordinates for which are based on a two-dimensional rectangular grid based on north-south and east-west axes. But this street map only works locally and can't be extended to the whole planet because of the curvature of the Earth. Nevertheless, we can imagine smoothly transitioning between lots of small rectangular maps to cover the whole surface of the globe.
In much the same way, (3+1)(3+1)-dimensional spacetime can be covered smoothly using lots of locally flat maps based on four coordinates. A spacetime that can be covered by a smoothly changing set of coordinates is known in mathematics as a manifold. ^(1){ }^{1} The study of smoothly changing spaces is known as differential geometry and is the subject of Part V of this book. Owing to its smoothness, a manifold describing spacetime is necessarily flat over a sufficiently small region, across which it looks identical to the Minkowski spacetime encountered in the first part of this book. Over larger distances however, the spacetime might be curved, and it is this curvature that we describe over the next few chapters, with the derivative formulated in this chapter an essential first step. ^(2){ }^{2} For example, if we think of the surface of the Earth then a straight-line path from New York to Paris would plough through the Earth, burrowing hundreds of kilometres underground. This directed straight-line path is then outside the space we are trying to describe.
Fig. 7.1 A tangent vector t\boldsymbol{t} is parallel transported around a surface. Its components are always the same in the local coordinate system, but tt changes when viewed in the (X,Z)(X, Z) coordinate system set up by observers able to embed the space in higher dimensions. ^(3){ }^{3} Imagine a tourist using a street map in London. In absolute terms, their North-direction is rather different to the North-direction of an analogous tourist in Tokyo, even though it is analogously defined in both city street maps as a particular vector tangent to the Earth. The North-direction could be Earth. The North-direction could be
thought of as being parallel transported thought of as being para
between the two cities.
Fig. 7.2 Failure of parallelism for sphere embedded in R^(3)\mathbb{R}^{3}. The vectors drawn on the surface of the sphere all point in the same absolute direction, according to the embedding in R^(3)\mathbb{R}^{3}, but they only lie in the tangent plane at one point; more often, they are pointing out of the tangent plane.
7.1 Parallelism
Consider a curved spacetime where observers are confined. The picture to have in mind is of ants confined to a two-dimensional surface such as a football, or of humans confined to the surface of the earth. All measurements are to be made in the surface, so the observers are not allowed to float above it to take advantage of its being embedded in three-dimensional space. The idea of a vector as a directed straight line joining two points is fine for flat space, but ceases to be of much use in a curved space. ^(2){ }^{2} The notion of a vector as a tangent to a path is, however, of much more use. Picture how the tangent vector to a path on a two-dimensional surface embedded in three-dimensional space will change its direction as we move it around the curved space, in order for it to still lie in the tangent plane of the surface. This behaviour of the tangent vector will provide a measure of vectors being parallel.
Next, we imagine that the path in the surface is one that doesn't change direction according to the observers (e.g. a great circle on a sphere, as shown in Fig. 7.1). The tangent vector of this path should, according to the trapped ants, be 'the same', or parallel, at all points on the path. A tangent vector to the path at some point may then be transported to a different point on the path and, if it is identical to the tangent vector determined at its new position, then we say that the vector has been parallel transported (Fig. 7.1). From the point of view of the observers confined to the surface, two vectors can then be compared at two different points in spacetime.
To generalize beyond tangent vectors: our trapped ants set up a coordinate system with which to make measurements. Measurements are always made locally, so they parallel transport their set of axes to the point where they want to measure the orientation of a vector. (The local coordinate system might, for example, use the tangent vector described above, and another axis in the surface orthogonal to this direction.) From this point of view, any vector that has the same components in each of the local coordinate systems is judged to be parallel at the different points. However, these vectors appear to change directions when we view the surface as embedded in a higher dimensional space, just as the coordinate system used by the ants on the surfaces appears to change with position (Fig. 7.1). ^(3){ }^{3} This concept of parallelism will be included when we describe derivatives in curved spacetime, since the derivative of a field of parallel vectors should come out to be zero.
Example 7.2
The vectors in Fig. 7.2 are all parallel in three-dimensional space R^(3)\mathbb{R}^{3}. However, for observers living on the surface of the sphere, the vectors are not parallel: they all make different angles to the tangent plane of the sphere's surface. The result of correctly parallel transporting a vector along several paths on a spherical surface is shown in Fig. 7.3. The components of the vector are identical in each of the local coordinate systems that the confined observers set up.
7.2 Derivatives and connections
We now turn to a method to evaluate the change in a vector with position. In order to evaluate, via a derivative, the change in a vector as it is transported around, we need to disentangle two effects. The first is the intrinsic change in the vector with position, which is what we want the derivative to output. The second is the change in the vector reflecting the fact that the coordinates (or, equivalently, the basis vectors) change in different parts of space. To extract the intrinsic change we define a new kind of derivative of the vector. This is the covariant derivative which evaluates the intrinsic change in the vector v\boldsymbol{v}. We can make sense of the covariant derivative conceptually as ((" Covariant ")/(" derivative of "v))_(u)=((" Change in ")/(" vector "v))-((" Change due to ")/(" coordinate system "))\binom{\text { Covariant }}{\text { derivative of } \boldsymbol{v}}_{u}=\binom{\text { Change in }}{\text { vector } \boldsymbol{v}}-\binom{\text { Change due to }}{\text { coordinate system }}.
This derivative is directional: it tells us the change in v\boldsymbol{v} as we move along the vector u\boldsymbol{u} (hence the subscript in the previous equation).
Example 7.3
The covariant derivative in a curved space generalizes the notion of a directional derivative in ordinary calculus. The gradient of a surface of constant ff in Euclidean 3 -space is given by
This is interpreted as a vector normal to the tangent plane of the surface of constant ff. If we want to know the change of f(x,y,z)f(x, y, z) along a particular vector vec(u)\vec{u} we use the directional derivative, defined as
{:(7.4) vec(u)* vec(grad)f=((" Value of "f)/(" at tip of "( vec(u))))-((" Value of "f)/(" at base of "( vec(u)))):}\begin{equation*}
\vec{u} \cdot \vec{\nabla} f=\binom{\text { Value of } f}{\text { at tip of } \vec{u}}-\binom{\text { Value of } f}{\text { at base of } \vec{u}} \tag{7.4}
\end{equation*}
Now to take the derivative. Consider the vector v=v^(mu)e_(mu)\boldsymbol{v}=v^{\mu} \boldsymbol{e}_{\mu}. We take a derivative with respect to the coordinates, allowing both the components v^(mu)v^{\mu} and the basis vectors e_(mu)\boldsymbol{e}_{\mu} to change in spacetime. The derivative we shall take is del//delx^(alpha)\partial / \partial x^{\alpha}, which should be thought of as the directional derivative along the direction e_(alpha)\boldsymbol{e}_{\alpha}. Employing the Leibniz product rule, we have
The tricky second term on the right is due to the change in basis vectors with position. The derivative of the basis vector e_(mu)\boldsymbol{e}_{\mu} can have components along any of the basis vectors, so to express this we define connection coefficients, also known as Christoffel symbols, ^(4){ }^{4} denoted Gamma^(mu)_(alpha beta)\Gamma^{\mu}{ }_{\alpha \beta}, and
Fig. 7.3 Parallel transport of a vector on a spherical surface. ^(4){ }^{4} Elwin Bruno Christoffel (1829-1900). The mathematics described here were invented by Christoffel in 1869 and further explored by Gregorio RicciCurbastro (1853-1925) who, in the years leading to 1900 , developed the mathematical machinery employed by Einstein in developing general relativity. We shall follow the modern convention of calling the symbols 'connection coefficients' in this book.
then write the change in basis vectors as
The connection coefficients encode all of the information describing how the coordinates change as we move around spacetime. Another way of thinking about this is that, since there are different local coordinate systems at different points in space, the connection coefficients tell us how the coordinate systems are connected, that is, how to translate between coordinate system as we move around. ^(5){ }^{5}
Example 7.4
In the following two chapters, we shall find a simple and efficient means of extracting connection coefficients. Before we get to that, here is a simple, 'brute force and ignorance' example, based on eqn 7.6. We saw in Chapter 3 that for a plane-polar coordinate system, the basis vectors have derivatives ^(6){ }^{6}
Although Euclidean space described by the plane polar coordinates is flat, we still have non-zero connection coefficients. The presence of connection coefficients therefore does not alone tell us whether a space is curved. ^(7){ }^{7}
In all of the coordinate frames that we examine in this book, the connection coefficients have the property
A connection with this property is often called symmetric or torsion free.
Finally, we ask why we call the Гs connection coefficients or Christoffel symbols? The answer is because, unlike most of the objects we deal with in relativity, they are not the components of a tensor. ^(8){ }^{8}
Example 7.5
Usually, we expect a tensor's components to transform as
From here on we shall write ^(9){ }^{9} this quantity as grad_(alpha)v\nabla_{\alpha} \boldsymbol{v}, which we call the covariant derivative of v\boldsymbol{v} along the direction e_(alpha)\boldsymbol{e}_{\alpha}. This new notation allows us to replace the left-hand side of eqn 7.13 with grad_(alpha)v\boldsymbol{\nabla}_{\alpha} \boldsymbol{v}, rather than as del v//delx^(alpha)\partial \boldsymbol{v} / \partial x^{\alpha} which turns out to be a much more convenient. ^(10){ }^{10} While we are in the process of introducing notation, a commonly used laboursaving shorthand for writing derivatives using commas and semicolons is given in the shaded box in the margin. Putting everything together, we have
In terms of our classification of tensors, notice that grad_(alpha)v\boldsymbol{\nabla}_{\alpha} \boldsymbol{v} is a ( 1,0 ) object, just like a vector.
Example 7.6
We can immediately note that in Minkowski spacetime the Cartesian basis vectors do not change with position and so all of the Gamma\Gamma coefficients are zero, giving the result that
We have worked out the covariant derivative along the direction of the basis vector e_(alpha)\boldsymbol{e}_{\alpha}. What about the covariant derivative along an arbitrary vector? For this purpose we define the connection operator grad\nabla by writing e_(alpha)*grad=grad_(alpha)\boldsymbol{e}_{\alpha} \cdot \boldsymbol{\nabla}=\boldsymbol{\nabla}_{\alpha}. We then define the action of our covariant derivative grad_(alpha)\nabla_{\alpha} on a scalar function simply as the derivative with respect to x^(alpha)x^{\alpha}, or
Generalizing to different directions by replacing e_(alpha)\boldsymbol{e}_{\alpha} with an arbitrary vector u\boldsymbol{u}, we have the directional derivative
If we now interpret the action of the connection operator on vectors in the same way, grad_(u)v\nabla_{u} \boldsymbol{v} is the covariant derivative of the vector v\boldsymbol{v} along the direction u\boldsymbol{u}. We write this as
In words, this is the directional derivative along the direction e_(alpha)\boldsymbol{e}_{\alpha}. ^(10){ }^{10} Relating the old notation to the new notation, we have
Important here is that the components of the derivative are not necessarily equal to the derivatives of the components. This is due to the non-zero connection coefficients. In fact, we can rewrite our definition of the derivative of components and basis vectors in the new notation
The covariant derivative is written in semicolon notation so that the mu\mu component of eqn 7.18 becomes (grad_(alpha)v)^(mu)-=v^(mu)_(;alpha)=v^(mu)_(,alpha)+v^(lambda)Gamma^(mu)_(alpha lambda)\left(\boldsymbol{\nabla}_{\alpha} \boldsymbol{v}\right)^{\mu} \equiv v^{\mu}{ }_{; \alpha}=v^{\mu}{ }_{, \alpha}+v^{\lambda} \Gamma^{\mu}{ }_{\alpha \lambda},
and so v^(mu)_(;alpha)v^{\mu}{ }_{; \alpha} are the components of the covariant derivative. This notation has the virtue of allowing some equations to be written more compactly, though this comes at the expense of leaving expressions littered with punctuation marks. ^(11){ }^{11} In semicolon notation
The covariant derivative selects out the change in a vector with position owing to its genuine change, eliminating the contribution due to the changing of coordinates with position. If a vector is parallel transported along a path then the only change should be due to the coordinates changing. So parallel transport of a vector v\boldsymbol{v} along the direction u\boldsymbol{u} implies
{:(7.24){:grad_(u)v=0quad" (parallel transport "):}\begin{equation*}
\left.\boldsymbol{\nabla}_{\boldsymbol{u}} \boldsymbol{v}=0 \quad \text { (parallel transport }\right) \tag{7.24}
\end{equation*}
Example 7.7
This latter expression means that the components obey
In words, the change in the components delv^(mu)//delx^(nu)\partial v^{\mu} / \partial x^{\nu} is, in this case, entirely due ^(12){ }^{12} to the change in coordinates -Gamma^(mu)_(alpha beta)v^(beta)-\Gamma^{\mu}{ }_{\alpha \beta} v^{\beta}. That is to say that when a vector is parallel transported we have
{:(7.27)((" Change in a ")/(" vector's components "))=((" Change due to ")/(" coordinate system ")):}\begin{equation*}
\binom{\text { Change in a }}{\text { vector's components }}=\binom{\text { Change due to }}{\text { coordinate system }} \tag{7.27}
\end{equation*}
7.4 Parametrized paths
We now have the covariant derivative at our disposal in the form of a directional derivative of some vector v\boldsymbol{v} taken along a vector u\boldsymbol{u}. We shall also need the derivative in a form more suitable to apply to curves in spacetime such as the world lines of particles.
We saw in Chapter 1 that the most general way to describe a curve is to parametrize it by introducing a quantity that varies monotonically along its length. That is, a curve stretching from point lambda=a\lambda=a to lambda=b\lambda=b is written as x(lambda)x(\lambda), where lambda\lambda parametrizes the curve. It marks off regular intervals, so we know how far along the curve we are, as shown in Fig. 7.4.
Example 7.8
For the two-dimensional space R^(2)\mathbb{R}^{2} a curve is given by (x(lambda),y(lambda)(x(\lambda), y(\lambda) ), i.e. with xx and yy both functions of lambda\lambda.
A straight line y=mx+cy=m x+c can be parametrized with x(lambda)=lambdax(\lambda)=\lambda and y(lambda)=m lambda+cy(\lambda)=m \lambda+c.
A parabola y=x^(2)y=x^{2} can be parametrized with x(lambda)=lambdax(\lambda)=\lambda and y=lambda^(2)y=\lambda^{2}.
A circle x^(2)+y^(2)=a^(2)x^{2}+y^{2}=a^{2} can be parametrized with x(lambda)=a cos lambdax(\lambda)=a \cos \lambda and y(lambda)=a sin lambday(\lambda)=a \sin \lambda.
The precise choice of parametrization isn't crucial. If an allowable parametrization is given by regular intervals of lambda\lambda, we could equally well choose a different parametrization eta\eta such that lambda=alpha eta+beta\lambda=\alpha \eta+\beta, where alpha\alpha and beta\beta are constants. Such a parametrization is called an affine parametrization.
With this in mind, we can return to the covariant derivative itself. In many cases, we are interested in the rate of change of a vector field, v(x)\boldsymbol{v}(x), an object where we input a position x=Px=\mathcal{P} and output a vector v\boldsymbol{v} appropriate for that point P\mathcal{P}. We then ask how rapidly the vector field v(x)\boldsymbol{v}(x) changes along a curve x^(mu)(lambda)x^{\mu}(\lambda). This involves parametrizing the curve, and then we seek
{:(7.28)((" Rate of change of ")/(v" with respect to "lambda))-=(Dv)/((d)lambda):}\begin{equation*}
\binom{\text { Rate of change of }}{v \text { with respect to } \lambda} \equiv \frac{\mathrm{D} v}{\mathrm{~d} \lambda} \tag{7.28}
\end{equation*}
Here we've introduced some new notation: the covariant derivative with respect to an affine parameter is denoted D//dlambda.^(13)\mathrm{D} / \mathrm{d} \lambda .{ }^{13}
In order to use the covariant derivative as we've defined it so far, we seek a vector telling us the direction along which to take the derivative. This is provided by the tangent vector to the curve x^(mu)(lambda)x^{\mu}(\lambda), given by ^(14){ }^{14}
That is, at every point lambda\lambda along the curve we have a tangent vector u\boldsymbol{u} (Fig. 7.5). This tangent vector is one of the most useful tools in this book. We then have ((" Rate of change of ")/(v" with respect to "lambda))-=(Dv)/(dlambda)-=grad_(u)v-=((" Covariant derivative of ")/(v" along "u))\binom{\text { Rate of change of }}{\boldsymbol{v} \text { with respect to } \lambda} \equiv \frac{\mathrm{D} \boldsymbol{v}}{\mathrm{d} \lambda} \equiv \boldsymbol{\nabla}_{\boldsymbol{u}} \boldsymbol{v} \equiv\binom{\text { Covariant derivative of }}{\boldsymbol{v} \text { along } \boldsymbol{u}}, where u\boldsymbol{u} us the tangent vector field to the curve. For massive particles, which follow timelike curves, we shall usually choose lambda\lambda to be the proper time tau\tau, which allows us to interpret the tangent as the particle's velocity, and provides the useful constraint u*u=-1\boldsymbol{u} \cdot \boldsymbol{u}=-1.
Example 7.9
This way of thinking about the covariant derivative makes it similar to an ordinary derivative defined in terms of evaluating a function at two points, f(x)f(x) and f(x+delta x)f(x+\delta x), and taking the difference in the limit of small delta x\delta x. However, key to the definition here is the notion of parallelism, which allows us to remove the change caused by the changing coordinate system. In order to do this, the instructions of how to take the covariant derivative, in these terms, are as follows:
(i) Take the vector v\boldsymbol{v} at lambda=lambda_(0)+epsi\lambda=\lambda_{0}+\varepsilon.
(i) Pake the vector v\boldsymbol{v} at lambda=lambda_(0)+epsi\lambda=\lambda_{0}+\varepsilon.
(ii) Parallel transport it back to lambda_(0)\lambda_{0}.
(ii) Parallel transport it back to lambda_(0)\lambda_{0}.
(iii) Evaluate delta v\delta \boldsymbol{v}, which measured how different it is from v\boldsymbol{v} at lambda_(0)\lambda_{0}.
(iii) Evaluate delta v\delta \boldsymbol{v}, which measured how
(iv) Divide by epsi\varepsilon and take the limit.
In equations, we have
{:(7.31)(Dv)/((d)lambda)=grad_(u)v=lim_(epsi rarr0)((v(lambda_(0)+epsi)_(("parallel transport to "lambda_(0)))-v(lambda_(0)))/(epsi)):}\begin{equation*}
\frac{\mathrm{D} \boldsymbol{v}}{\mathrm{~d} \lambda}=\nabla_{u} \boldsymbol{v}=\lim _{\varepsilon \rightarrow 0}\left(\frac{\boldsymbol{v}\left(\lambda_{0}+\varepsilon\right)_{\left(\text {parallel transport to } \lambda_{0}\right)}-\boldsymbol{v}\left(\lambda_{0}\right)}{\varepsilon}\right) \tag{7.31}
\end{equation*}
^(13){ }^{13} The notation reminds us that owing to the changes in the coordinate systems, the components of the covariant derivative (Dv//dlambda)^(mu)(\mathrm{D} \boldsymbol{v} / \mathrm{d} \lambda)^{\mu}, will not generally be equivalent to the derivatives of components dv^(mu)//dlambda\mathrm{d} v^{\mu} / \mathrm{d} \lambda. However, for a scalar field ff we do have df//dlambda=Df//dlambda\mathrm{d} f / \mathrm{d} \lambda=\mathrm{D} f / \mathrm{d} \lambda.
Fig. 7.5 The tangent vectors u=(dx^(mu)(lambda)//dlambda)e_(mu)\boldsymbol{u}=\left(\mathrm{d} x^{\mu}(\lambda) / \mathrm{d} \lambda\right) \boldsymbol{e}_{\mu} along the curve parametrized by lambda\lambda. For lambda=tau\lambda=\tau this provide a velocity vector (which itself varies along the path). ^(14){ }^{14} We do not write dx//dlambda\mathrm{d} \boldsymbol{x} / \mathrm{d} \lambda as we sometime do in special relativity. As discussed in Chapter 3, the displacement vector x=x^(mu)e_(mu)\boldsymbol{x}=x^{\mu} \boldsymbol{e}_{\mu}, thought of as pointing a distance |x||\boldsymbol{x}| from the origin to coordinate point x^(mu)x^{\mu}, does not transform appropriately, and so we won't use it in this form (e.g. by taking its derivative). Note also that the tangent vector is given by u=(Dx^(mu)(lambda)//dlambda)e_(mu)=\boldsymbol{u}=\left(\mathrm{D} x^{\mu}(\lambda) / \mathrm{d} \lambda\right) \boldsymbol{e}_{\mu}=(dx^(mu)(lambda)//dlambda)e_(mu)\left(\mathrm{d} x^{\mu}(\lambda) / \mathrm{d} \lambda\right) e_{\mu}, since x^(mu)(lambda)x^{\mu}(\lambda) is a set of scalar functions.
Fig. 7.6 Taking the covariant derivative using eqn 7.31 .
Example 7.10
We can check our new version of the covariant derivative of a vector A\boldsymbol{A} in the case of flat spacetime, where the connection coefficients expressed in Cartesian coordinates vanish. Writing out all the components, we have
where we've used the chain rule in the final step. We conclude that for flat spacetime the components of the derivative are simply the derivatives of the components with respect to the parameter lambda\lambda.
The covariant derivative notation D//dlambda\mathrm{D} / \mathrm{d} \lambda proves very useful, not least because of its resemblance to the ordinary derivative.
7.5 Enter the metric
After formulating a covariant derivative, we might ask if this is the only way we could have constructed it. It turns out that our freedom to formulate it was restricted by the metric field g\boldsymbol{g}, which is the foundation of our physical description of spacetime, and it is exactly this metric field that forces this version of the covariant derivative upon us. This idea is encapsulated in the notion of what is called the compatibility of the connection which requires that the covariant derivative obeys
for all alpha\alpha. This equation inseparably joins the metric and the covariant derivative. ^(15){ }^{15} The importance of this condition is that, if it did not hold, then the lengths of vectors would change as we parallel transport them. ^(16){ }^{16} This would be highly undesirable for a description of the physics of the real world. ^(16){ }^{16} Another consequence of this equation is that it provides the long-awaited explanation of what affine parametrizations actually are. They are those smooth parametrizations of a curve that have the property that the length of a vector doesn't change as we parallel transport the vector along the curve. ^(17){ }^{17} The Leibniz product rule does indeed hold, as discussed in Part V. ^(15){ }^{15} We do not yet have an explicit ex pression for how to compute the covariant derivative of a (0,2)(0,2) tensor like g\boldsymbol{g}. We delay deriving the explicit expression until Part V. At this stage we stat that it is given in components by g_(mu nu;alpha)=g_(mu nu,alpha)-Gamma^(beta)_(alpha mu)g_(beta nu)-Gamma^(beta)_(alpha nu)g_(mu beta)g_{\mu \nu ; \alpha}=g_{\mu \nu, \alpha}-\Gamma^{\beta}{ }_{\alpha \mu} g_{\beta \nu}-\Gamma^{\beta}{ }_{\alpha \nu} g_{\mu \beta}.
It is also helpful at this stage to note that we can also write the derivative for a (2,0)(2,0) tensor like T\boldsymbol{T} in components as T^(mu nu)_(;alpha)=T^(mu nu)_(,alpha)+Gamma^(mu)_(alpha beta)T^(beta nu)+Gamma^(nu)_(alpha beta)T^(mu beta)T^{\mu \nu}{ }_{; \alpha}=T^{\mu \nu}{ }_{, \alpha}+\Gamma^{\mu}{ }_{\alpha \beta} T^{\beta \nu}+\Gamma^{\nu}{ }_{\alpha \beta} T^{\mu \beta}. qquad\qquad
Example 7.11
We shall prove the compatibility condition. The length of a vector is given by (the square root of) A*A=g(A,A)\boldsymbol{A} \cdot \boldsymbol{A}=\boldsymbol{g}(\boldsymbol{A}, \boldsymbol{A}), or g_(mu nu)A^(mu)A^(nu)g_{\mu \nu} A^{\mu} A^{\nu}. Take A\boldsymbol{A} to be a covariant constant (i.e. parallel) such that grad_(alpha)A=0\nabla_{\alpha} \boldsymbol{A}=0 (or A^(mu)_(;alpha)=0A^{\mu}{ }_{; \alpha}=0 ). If the length of the vector is constant we expect the covariant derivative of g(A,A)\boldsymbol{g}(\boldsymbol{A}, \boldsymbol{A}) to vanish. Assuming the Leibniz product rule ^(17){ }^{17} we have
Since A^(mu)_(;alpha)=0A^{\mu}{ }_{; \alpha}=0, then we must have g_(mu nu;alpha)=0g_{\mu \nu ; \alpha}=0, as claimed.
Finally, we note that the compatibility condition is the basis of other links between the metric and the covariant derivative. The connection coefficients Gamma^(mu)_(alpha beta)\Gamma^{\mu}{ }_{\alpha \beta} may be derived directly from the components of the metric. ^(18){ }^{18} We saw how the connection coefficients arose due to the change in the basis vectors with position in spacetime and could be calculated via derivatives like dele_(mu)//delx^(alpha)=Gamma^(lambda)_(mu alpha)e_(lambda)\partial \boldsymbol{e}_{\mu} / \partial x^{\alpha}=\Gamma^{\lambda}{ }_{\mu \alpha} \boldsymbol{e}_{\lambda}. Recall also that the components of the metric reflect the basis vectors via g_(mu nu)=g(e_(mu),e_(nu))g_{\mu \nu}=\boldsymbol{g}\left(\boldsymbol{e}_{\mu}, \boldsymbol{e}_{\nu}\right) or, more simply g_(mu nu)=e_(mu)*e_(nu)g_{\mu \nu}=\boldsymbol{e}_{\mu} \cdot \boldsymbol{e}_{\nu}. It therefore comes as little surprise that the connection coefficients are formed from a combination of first derivatives of the metric components as enshrined in the conceptual expression
We shall expand on this point in the coming chapters.
In the next two chapters, we turn to the use of the covariant derivative in understanding how a particle moves under the influence of gravitation.
Chapter summary
Parallel transport provides a method of comparing vectors at different points in curved space. A vector that is moved such that it has the same components in different local coordinate systems has been parallel transported.
The covariant derivative can be used to measure how vector fields change with position in spacetime. It is a directional derivative given by
The tangent vector to the world line of a massive particle parametrized by the proper time is the timelike velocity vector u\boldsymbol{u}, with the property u*u=-1\boldsymbol{u} \cdot \boldsymbol{u}=-1. ^(18){ }^{18} We shall see in Chapter 9 that the expression we will need is g_(rho lambda)Gamma^(rho)_(mu sigma)=g_{\rho \lambda} \Gamma^{\rho}{ }_{\mu \sigma}= (1)/(2)((delg_(lambda mu))/(delx^(sigma))+(delg_(lambda sigma))/(delx^(mu))-(delg_(mu sigma))/(delx^(lambda)))\frac{1}{2}\left(\frac{\partial g_{\lambda \mu}}{\partial x^{\sigma}}+\frac{\partial g_{\lambda \sigma}}{\partial x^{\mu}}-\frac{\partial g_{\mu \sigma}}{\partial x^{\lambda}}\right). (7.37)
and verify u*Du//ds\boldsymbol{u} \cdot \mathrm{D} \boldsymbol{u} / \mathrm{d} s vanishes.
Now consider the reparametrization t=sin st=\sin s. This is not in the form t=as+bt=a s+b, so is not an affine parametrization.
(c) Recompute the components of the tangent vector u\boldsymbol{u}, the derivative Du//dt\mathrm{D} \boldsymbol{u} / \mathrm{d} t, and u*Du//dt\boldsymbol{u} \cdot \mathrm{D} \boldsymbol{u} / \mathrm{d} t using this new parametrization.
(7.3) Consider the vector field v\boldsymbol{v} in two-dimensional flat space with Cartesian components (v^(x),v^(y))=\left(v^{x}, v^{y}\right)= ( 0,Cx0, C x ), with CC a constant.
(a) Compute the vector grad_(mu)v\boldsymbol{\nabla}_{\mu} \boldsymbol{v} for mu=x\mu=x and yy.
(b) Convert the components of the vector into cylindrical polar coordinates using the transformations
from Chapter 3.
(c) Using the connection coefficients given in the chapter, compute the vectors grad_(mu)v\boldsymbol{\nabla}_{\mu} \boldsymbol{v} for mu=r\mu=r and theta\theta. (d) Treat the quantity (grad_(mu)v)^(nu)\left(\nabla_{\mu} \boldsymbol{v}\right)^{\nu} as components of a (1,1)(1,1) tensor. Using the tensor transformation law, show that the results from part (c) are consistent with those of part (a).
(7.4) The covariant derivative of a (0,2)(0,2) tensor is written as
Apply this to the components of the metric tensor and show, using eqn 7.37 , that (grad_(alpha)g)_(mu nu)=0\left(\nabla_{\alpha} \boldsymbol{g}\right)_{\mu \nu}=0.
Free fall and geodesics
The bigger they come, the harder they fall Barbados Joe Walcott (1873-1935) and Bob Fitzsimmons (1863-1917)
A geodesic can be thought of geometrically as the straightest possible path in a curved spacetime. Geodesics are the paths that extremize the interval between spacetime events. Physically, a particle in free fall has a world line that follows a geodesic. ^(1){ }^{1} The equation of motion for such a particle, known as the geodesic equation, therefore tells us about the motion of a particle that is not subject to external forces. In this chapter, we investigate geodesics and derive the geodesic equation that tells us how curvature causes particles to move. Our task here is to introduce some key ideas in geodesic motion. In the next chapter, we look at the details of how to extract the connection coefficients required to compute geodesics.
Example 8.1
In pre-relativity physics, a particle subject to no forces does not accelerate and has an equation of motion given by vec(x)^(¨)=0\ddot{\vec{x}}=0. In relativity, a particle follows a path x^(mu)(tau)=x^{\mu}(\tau)=(t(tau),x(tau),y(tau),z(tau))(t(\tau), x(\tau), y(\tau), z(\tau)) parametrized by some affine parameter, such as its proper time tau\tau. If a particle is in a flat, Minkowski spacetime, we have an acceleration
where b^(mu)b^{\mu} and c^(mu)c^{\mu} are a set of constant components. We see that in Minkowski spacetime, the particles fall along straight lines.
8.1 Extremal intervals
Hamilton's principle tells us that the action for the trajectory that is realized by a system, is the one that takes a stationary value when the action is varied. The action often takes a minimum value (e.g. for straight line motion in a two-dimensional plane), but examples of why and when it takes a saddle point or maximum value are less well known. However, they can be important in relativity.
8.1 Extremal intervals
Exercises
^(1){ }^{1} One way to justify this is to recall from mechanics that the action for free, massive particles is given by S=-m int(-ds^(2))^((1)/(2))S=-m \int\left(-\mathrm{d} s^{2}\right)^{\frac{1}{2}}. The result of extremizing the action is the equations of motion for the particle. Since mm in a scalar, extremizing the action amounts to extremizing the timelike interval Delta tau=int(-ds^(2))^((P)/(2))\Delta \tau=\int\left(-\mathrm{d} s^{2}\right)^{\frac{\mathrm{P}}{2}} between events with timelike separation. This means that a solution of the equations of motion gives us a geodesic (which is timelike for a massive particle). It is worth stressing that it is only free particles that travel along geodesics. Particles subject to other interactions have additional terms in their Lagrangian and ditional terms ine ther and so give rise to equations of motion whose solutions are not the geodesics of spacetime. Although this argument applies to massive particles and their timelike geodesics, we can also discuss null and spacelike geodesics. To compute spacelike geodesics we extremize the proper length Delta l=int(+ds^(2))^((1)/(2))\Delta l=\int\left(+\mathrm{d} s^{2}\right)^{\frac{1}{2}} between events with spacelike separation. Although nothing travels on spacelike geodesics (except hypothetical tachyon particles, which travel faster than light), spacelike geodesics are often interesting and useful. Photons travel along null geodesics and are discussed in Section 8.4. ^(2){ }^{2} More generally, a conjugate point is a point where the matrix N_(ij)^(-1)N_{i j}^{-1} is singular, and this occurs at q_(1)^(**)q_{1}^{*} since a nonzero value of deltap^(j)\delta p^{j} gives no variation in deltaq^(i)\delta q^{i}. The rule is that the trajectory is a deltaq^(i)\delta q^{i}. The rule is that the trajectory is a
minimum in the action if the trajectory minimum in the action if the trajectory
does not pass though a conjugate point. does not pass though a conjugate point.
This was the case for path (i). The trajectory is not a minimum in the action if the trajectory passes through a conjugate point, which is the case for path (ii). In general, if two geodesics are sent out from a point P\mathcal{P} and later cross at a point Q\mathcal{Q}, then Q\mathcal{Q} is a conjugate point to P\mathcal{P}. This argument has a very important use in singularity theorems, such as the one discussed in Chapter 50 .
Fig. 8.1 (a) Points q_(1)q_{1} and q_(2)q_{2} on the sphere. The conjugate point q_(1)^(**)q_{1}^{*} is at the antipode of q_(1)q_{1}. Paths (i) and (ii), which lie on a great circle, are shown. (b) Setting off a swarm of particles at q_(1)q_{1} results in their trajectories realizing a focus at q_(1)^(**)q_{1}^{*}.
Example 8.2
Consider non-relativistic motion between two points on the surface of a sphere, starting at q_(1)q_{1} and ending at q_(2)q_{2}, as shown in Fig. 8.1(a). We assume that these points are not antipodal (i.e. not opposite points on the sphere). There are two geodesics: (i) a trajectory that represents the shortest distance between the points; and (ii) a trajectory that lies on the same great circle, but which heads off from q_(1)q_{1} in the other direction, through its antipodal point and then to q_(2)q_{2}. For particles set off by an observer along these two paths with the same momentum, path (i) takes least time and gives the minimum action; path (ii) takes longer and gives a saddle point action. There is a way to use this example to find work out if the trajectory is a minimum or not. We set in motion a swarm of free particles from q_(1)q_{1}, almost along path (i), but with slightly different directions of their momentum, as shown in Fig. 8.1(b), The swarm spreads out initially, with each particle following its own geodesic. In general, the variation in initial momenta deltap^(j)\delta p^{j} (where jj labels the particle in question) will lead to variations in position deltaq^(i)\delta q^{i} at some final time, given by a matrix equation will lead to variations in position deltaq^(i)\delta q^{i} at some final time, given by a matrix equation deltaq^(i)=N_(ij)deltap^(j)\delta q^{i}=N_{i j} \delta p^{j}. However, on the sphere, the result of this thought experiment is that all of the trajectories eventually collapse down and focus at the antipode of q_(1)q_{1}. This focal point at q_(1)^(**)q_{1}^{*} is known as a conjugate point ^(2){ }^{2} to q_(1)q_{1}. The trajectory that passes through q_(1)^(**)q_{1}^{*} is the saddle point; the trajectory that does not is the minimum.
Turning now to geometry, the line element is given in terms of metric components by ds^(2)=g_(mu nu)dx^(mu)dx^(nu)\mathrm{d} s^{2}=g_{\mu \nu} \mathrm{d} x^{\mu} \mathrm{d} x^{\nu}. We shall be interested in the extremal value of the total interval s=intsqrt(|ds^(2)|)s=\int \sqrt{\left|\mathrm{d} s^{2}\right|} for the path between two points. Whether this extremal interval represents a maximum or minimum interval now depends on the signature of the metric.
Example 8.3
With a Riemannian metric (that is, one with signature ++++ ) it is always possible to find arbitrarily long paths between two points, but the path length is bounded from below at a minimum value representing the shortest path between the two points. This path is a geodesic and represents the straightest possible curve in the space represented by the metric. (However, owing to the discussion in the last example, we cannot say that some given geodesic is necessarily the shortest distance between the two points.)
If we have a Lorentz metric (with signature -+++ ) then the interval between two points is positive if the interval is spacelike, negative if the interval is timelike and zero if the interval is null. It is always possible to find timelike curves with arbitrarily small intervals of proper time linking two points. If a curve of maximum proper time exists, it will be a timelike geodesic and represent the straightest path between the two points. This might seem the wrong way round, but follows from the minus sign in front of the timelike component in the metric. As a sanity check, we can confirm that a timelike geodesic does not minimize the length of a curve using a graphical method. Consider Fig. 8.2, showing a timelike curve approximated by a series of null paths. The timelike interval Delta tau=intdtau=int(-ds^(2))^((1)/(2))\Delta \tau=\int \mathrm{d} \tau=\int\left(-\mathrm{d} s^{2}\right)^{\frac{1}{2}} is positive, but is infinitesimally close to a path formed from a series of null paths (Fig. 8.2) which, by definition, each have ds=0\mathrm{d} s=0, giving a vanishing total interval. It is therefore possible to make the timelike interval arbitrarily small.
We shall parametrize paths in spacetime using an affine parameter lambda\lambda. To find the geodesic curve x^(mu)(lambda)x^{\mu}(\lambda) that extremizes the interval ss between
two points, we split the interval into elements of length ds\mathrm{d} s and write
which gives us a method for identifying the function L(x^(mu),x^(˙)^(mu))L\left(x^{\mu}, \dot{x}^{\mu}\right) in this problem as L=|(ds//dlambda)^(2)|^((1)/(2))L=\left|(\mathrm{d} s / \mathrm{d} \lambda)^{2}\right|^{\frac{1}{2}}. This function must obey the EulerLagrange (EL) equation from Chapter 2, whose solution will allow us to identify the geodesic. Let's discuss a simple example.
Example 8.4
In Cartesian coordinates, in two dimensions we write the distance between points
The task is to find the shortest path between points in the plane (a spacelike geodesic). We know the answer: the path is, of course, a straight line. Applying the E-L equations for the variable xx, we find
Similar expressions are found for the variable yy.
Before powering ahead to solve these equations, it's useful to take a closer look at the idea of parametrizing a path. Notice that the particular recipe for choosing lambda\lambda along the curve hasn't been specified. Considering the interval from the previous example, we see that the choice of lambda\lambda is arbitrary: simply scaling lambda rarr a lambda+b\lambda \rightarrow a \lambda+b has no effect on the action. ^(3){ }^{3}
We shall exploit the freedom to choose lambda\lambda to make life easy for us. We vary the Lagrangian the first time, to find del L//del x\partial L / \partial x and del L//delx^(˙)\partial L / \partial \dot{x}, with an as-yet-unspecified parametrization lambda\lambda. This tells us how the action changes with xx and x^(˙)=dx//dlambda\dot{x}=\mathrm{d} x / \mathrm{d} \lambda. After this stage, the dependence of the interval on the parameter lambda\lambda has been determined, but the precise choice of lambda\lambda is still unspecified. ^(4){ }^{4} Next, we make a choice of lambda\lambda. The laboursaving parametrization to choose is called length parametrization. This choice is simply that dlambda=ds\mathrm{d} \lambda=\mathrm{d} s, which implies that the parameter lambda\lambda simply measures the interval along the length of curve. The physical interpretation of this choice is that the parameter lambda\lambda represents the proper time for timelike paths or proper length for spacelike paths. So for timelike paths, lambda\lambda is the (proper) time measured by the observer in their locally flat spacetime as they fall along the geodesic. Length parametrization is therefore not just convenient, it is necessary to allow us to interpret ss as the interval between events in spacetime.
In the case of Cartesian coordinates discussed above, we would write dlambda=ds=(dx^(2)+dy^(2))^((1)/(2))\mathrm{d} \lambda=\mathrm{d} s=\left(\mathrm{d} x^{2}+\mathrm{d} y^{2}\right)^{\frac{1}{2}}, which, inserted into the Lagrangian yields
Fig. 8.2 A smooth timelike curve can be represented as being approximated by a series of null paths. As the number of null paths is increased we get closer to the timelike curve. A timelike curve is in this sense infinitesimally close to a series of curves of zero length. ^(3){ }^{3} If we have another parameter eta\eta, such that lambda=lambda(eta)\lambda=\lambda(\eta), we write
This implies that we can just as well use eta\eta as lambda\lambda and expect no change in the form of the action integral. ^(4){ }^{4} One can think of lambda\lambda as cancelling in the left-hand side of the E-L equation
^(5){ }^{5} The dot notation is used here to denote a derivative with respect to the parameter lambda\lambda.
This does not imply that we are varying unity (with the inevitable result that the terms in the equation vanish): we have already taken a first set of derivatives and so the dependence on lambda\lambda is now fixed. A consequence of the choice of parametrization is that, in addition to the equations of motion derived from the Euler-Lagrange equations, we also have an addition constraint equation given by, for our choice, by L=1L=1. There is some redundancy here: the Euler-Lagrange equations give us enough information to find the extremal path. However, the extra equation frequently simplifies the algebra and is therefore a valuable help.
After this detour, let's now return to our simple example.
Example 8.5
Use our freedom to choose lambda\lambda such that L=1L=1 for the remainder of the calculation. We then have
and also, of course, d^(2)y//dlambda^(2)=0\mathrm{d}^{2} y / \mathrm{d} \lambda^{2}=0.
Since the double derivative of each of the coordinates is zero, the path must represent a straight line. We can check that these three equations are solved with the equation for a straight line
These equations are solved if phi\phi is constant and theta\theta increases linearly with lambda\lambda (path AA in Fig. 8.3). There is a similar solution where theta=pi//2\theta=\pi / 2 and phi\phi increases linearly with lambda\lambda (path BB in Fig. 8.3). These solutions are familiar as the shortest distances between points on a sphere since they are arcs of great circles. Generally, theta=\theta= const. is not a solution and so path CC in Fig. 8.3, which has theta=pi//4\theta=\pi / 4, is not a solution. ^(7){ }^{7}
With some geodesics under our belt, we now turn to the more general problem of finding the geodesic representing path of a particle in free fall.
8.2 A geodesic equation
We define the covariant acceleration vector for a massive particle as a=Du//dtau\boldsymbol{a}=\mathrm{D} \boldsymbol{u} / \mathrm{d} \tau, where u\boldsymbol{u} is the particle's velocity and the proper time tau\tau parametrizes the world line. A particle in free fall follows a geodesic. It feels no force (by definition of free fall), so has no covariant acceleration, giving Du//dtau=0\mathrm{D} \boldsymbol{u} / \mathrm{d} \tau=0. Geometrically, the particle's velocity is tangent to its world line, so an equivalent expression is a=grad_(u)u=0\boldsymbol{a}=\boldsymbol{\nabla}_{\boldsymbol{u}} \boldsymbol{u}=0. Recall that parallel transport of a vector v\boldsymbol{v} along a path with tangent u\boldsymbol{u} implies that the covariant derivative grad_(u)v\boldsymbol{\nabla}_{\boldsymbol{u}} \boldsymbol{v} vanishes. So we now arrive at a geometrical definition that the geodesic is a path in spacetime that parallel transports its own tangent vector. Although we've discussed the dynamics of a particle, this geometric definition applies to any geodesic (e.g. a spacelike one) parametrized by an arbitrary affine parameter lambda\lambda. Therefore, we can test if a curve with tangent vector u\boldsymbol{u} is a geodesic using
We notice that the first term can be written as du^(mu)//dlambda\mathrm{d} u^{\mu} / \mathrm{d} \lambda, which is simply the acceleration in the ordinary, flat, Cartesian system: the double derivative of x^(mu)x^{\mu} with respect to lambda\lambda. This allows us to write a differential equation for the path of a geodesic, known as the geodesic equation
^(7){ }^{7} For example, set theta(lambda)=lambda\theta(\lambda)=\lambda and phi(lambda)=\phi(\lambda)= 0 and the equations are solved. Similarly theta=pi//2\theta=\pi / 2 and phi=lambda\phi=\lambda solve the equations. Setting theta=pi//4\theta=\pi / 4 and phi=lambda\phi=\lambda does not solve the equations.
Fig. 8.3 Paths on the surface of a sphere. AA (which runs from the North Pole to the equator) and BB (which runs round the equator) are geodesics; CC (dashed curve) is not. ^(8){ }^{8} Keep in mind that instead of tau\tau, we are free to use any affine parameter lambda\lambda related to tau\tau via lambda=a tau+b\lambda=a \tau+b, where aa and bb are constants. In fact, this provides us with another definition of an affine parameter: they are those parameters fo which the description of the world line has the form of the geodesic equation. Note that intervals for photons cannot be assigned a proper time (or length) since they travel along null geodesics. In terms of an affine parameter sigma\sigma, the infinitesimal interval between two closely spaced events on a photon's world line is
where dx^(sigma)\mathrm{d} x^{\sigma} is the coordinate interval between the events. We discuss photons further at the end of this chapter.
Taking lambda=tau\lambda=\tau, this is the equation of motion for a massive particle in curved spacetime in the absence of an external force. We can interpret the geodesic equation as telling us that particles that aren't subject to an external force freely fall along a geodesic, following what a local observer would interpret to be straight lines. ^(8){ }^{8}
Example 8.7
The expression for velocity u*u\boldsymbol{u} \cdot \boldsymbol{u}, involving the tangent vector of a geodesic u\boldsymbol{u}, determines whether it is timelike (u*u=-1)(\boldsymbol{u} \cdot \boldsymbol{u}=-1) null (=0)(=0), or spacelike (where, if we take the affine parameter lambda\lambda to be the proper length, we will have u*u=1\boldsymbol{u} \cdot \boldsymbol{u}=1 ). We can see how this quantity changes along a geodesic by evaluating
where the zero on the right-hand side of the last expression follows because grad_(u)u=0\nabla_{u} \boldsymbol{u}=0 for a geodesic. We conclude that a timelike tangent vector is always timelike along a timelike geodesic, and similarly for null and spacelike vectors.
The geodesic equation in its geometric version, grad_(u)u=0\boldsymbol{\nabla}_{\boldsymbol{u}} \boldsymbol{u}=0, can also be used to motivate a geometric form of momentum conservation, which can be expressed as
We can always find a locally flat local inertial frame (LIF) coordinate system, where all of the Gamma\Gamma 's vanish at a point, and therefore flat-space momentum conservation (or (dp)/((d)tau)=0\frac{\mathrm{d} p}{\mathrm{~d} \tau}=0 ) can be written as grad_(p)p=0\nabla_{p} \boldsymbol{p}=0. As this is a valid tensor equation in a flat (dp)/((d)tau)=0\frac{\mathrm{d} p}{\mathrm{~d} \tau}=0 ) can be written as grad_(p)p=0\nabla_{p} \boldsymbol{p}=0. As this is a valid tensor equation in a flat
frame, by the principle of general covariance, the same equation must also be true in any frame. ^(9){ }^{9}
8.3 Inertial forces
The geodesic equation tells us that if spacetime gives non-zero connection coefficients Gamma^(mu)_(alpha beta)\Gamma^{\mu}{ }_{\alpha \beta}, we observe an acceleration, even in the absence of an external force. This acceleration could be due to spacetime being curved, or simply to our choice of coordinates. We are used to accelerations resulting from forces and so we interpret the acceleration x^(¨)^(mu)\ddot{x}^{\mu} that results from the connections as corresponding to an inertial force. The inertial force on particle of mass mm is given by the geodesic equation as
where dot notation means x^(˙)^(mu)=dx^(mu)//dtau\dot{x}^{\mu}=\mathrm{d} x^{\mu} / \mathrm{d} \tau.
Example 8.9
In classical mechanics, if you move in an accelerating frame of reference such as the rotating earth, you feel an inertial force. Although these are sometimes known as fictional forces, there is little fictional about them to the person experiencing them.
Consider the game swingball ^(10){ }^{10} where a tennis ball is connected by a string to a post (as shown in Fig. 8.4) and executes circular motion in the horizontal plane. In the frame of the players, the ball, with mass mm accelerates as it swings around the pole. Referring to the figure, the tension TT in the string supplies a vertical component T cos theta=mgT \cos \theta=m g that balances the gravitational force. The horizontal component T sin thetaT \sin \theta is not cancelled, but instead supplies the centripetal force that maintains the circular motion, equal to mv^(2)//rm v^{2} / r, where vv is the velocity and rr the radius of the circular motion. From the local frame of the tennis ball the picture is rather different. The must be balanced by a new force vec(F)\vec{F}. This is the centrifugal force: an outwarddirected force felt by the ball as a real force, but not evident in the players' frame.
The principle of equivalence teaches us that the acceleration due to gravitation, for a single particle, is indistinguishable from the acceleration due to a particular choice of coordinates. Since free particles fall along geodesics, the geodesic equation says that acceleration a^(mu)a^{\mu} is given by x^(¨)^(mu)=-x^(˙)^(alpha)x^(˙)^(lambda)Gamma^(mu)_(alpha lambda)\ddot{x}^{\mu}=-\dot{x}^{\alpha} \dot{x}^{\lambda} \Gamma^{\mu}{ }_{\alpha \lambda}, which is to say that gravitation gives rise to an inertial force.
Example 8.10
Consider the metric that describes a weak gravitational field
where Phi(x,y,z)=-GM//(x^(2)+y^(2)+z^(2))^((1)/(2))\Phi(x, y, z)=-G M /\left(x^{2}+y^{2}+z^{2}\right)^{\frac{1}{2}} is the gravitational potential. This can be used to gives an equation of motion ^(11){ }^{11}
where r^(2)=x^(2)+y^(2)+z^(2)r^{2}=x^{2}+y^{2}+z^{2}. In a non-relativistic limit, we have tau~~t\tau \approx t. The connection therefore supplies the components of the gravitational force -Gamma_(tt)^(i)-\Gamma_{t t}^{i} in the geodesic equation.
If a particle is subject to an externally applied 4 -force f\boldsymbol{f} with components f^(mu)f^{\mu}, then this supplies a non-zero right-hand side of the geodesic equation and we have, in components,
^(10){ }^{10} Swingball is more usually known as tetherball outside Britain.
Fig. 8.4 Swingball in (a) the frame of the players and (b) the frame of the ball. ^(11){ }^{11} See the next chapter for details of the method.
Example 8.11
We can use the connection coefficients from the last chapter for motion in cylindrical coordinates to derive equations of motion using the geodesic equation. In the noncoordinates to derive equations of motion
relativistic limit, we take tau~~t\tau \approx t and write ^(12){ }^{12}
The trajectory of a particle following uniform circular motion in flat spacetime is not a geodesic: the geodesics are straight lines. From eqn 8.35 we see that if a particle is undergoing uniform circular motion in this coordinate system we have a^(r)=0a^{r}=0 and so an inward-directed external force f^(r)f^{r} (the centripetal force) must be applied in order an inward-directed external force f^(r)f^{r} (the centripetal force) must be applied in order
to balance the inertial force mr(u^(theta))^(2)m r\left(u^{\theta}\right)^{2}. If we want the circular motion to be uniform to balance the inertial force mr(u^(theta))^(2)m r\left(u^{\theta}\right)^{2}. If we want the circular motion to be uniform
with a^(theta)=0a^{\theta}=0, then having u^(r)=0u^{r}=0 guarantees no component of force is needed in the with a^(theta)=0a^{\theta}=0, theta\theta direction.
8.4 Geodesics for photons
One of the most striking predictions of general relativity is that curved spacetime affects the motion of light. Photons, the particles of light, are massless and in the absence of interactions, fall along null geodesics. ^(13){ }^{13} At each point along null geodesic a light cone will be tangent to the curve, as shown in Fig. 8.5. A simply way to analyse the paths of light is, therefore, to consider directly the constraint ds^(2)=0\mathrm{d} s^{2}=0 for photons, as demonstrated in the following example.
Example 8.12
Consider a light ray travelling in a weak, Newtonian gravitational field. Earlier we used the line element in eqn 8.30 to describe the geometry. Recall that the more accurate expression, which includes the correction to the spacelike parts, is given accura
by ^(14)^{14}
ds^(2)=-[1+2Phi(x,y,z)]dt^(2)+[1-2Phi(x,y,z)](dx^(2)+dy^(2)+dz^(2))\mathrm{d} s^{2}=-[1+2 \Phi(x, y, z)] \mathrm{d} t^{2}+[1-2 \Phi(x, y, z)]\left(\mathrm{d} x^{2}+\mathrm{d} y^{2}+\mathrm{d} z^{2}\right)
where Phi(x,y,z)=-GM//r\Phi(x, y, z)=-G M / r. Setting ds^(2)=0\mathrm{d} s^{2}=0 for photons, we find an expression for the null geodesics in terms of the coordinates rr and tt which is
which, as we saw in Chapter 5, can be used to investigate the light cone structure of this geometry
This expression can also be used to demonstrate a famous relativistic effect known as Shapiro time delay. A light pulse is sent from a distant planet to the Earth, passing close to the Sun, with a distance of closest approach of bb, as shown in Fig. 8.6. If we imagine light travelling between two coordinate points (-d_(p),y_(0),z_(0))\left(-d_{\mathrm{p}}, y_{0}, z_{0}\right) and (d_(e),y_(0),z_(0))\left(d_{\mathrm{e}}, y_{0}, z_{0}\right) along the xx-direction, then dr=dx\mathrm{d} r=\mathrm{d} x. Expanding the equation describing the geodesic in the limit of small ^(15)GM//r{ }^{15} G M / r we find the coordinate time elapsed between the events is
{:(8.39)t=intdx(1+(2GM)/(r))=intdx[1+(2GM)/((x^(2)+y^(2)+z^(2))^((1)/(2)))].:}\begin{equation*}
t=\int \mathrm{d} x\left(1+\frac{2 G M}{r}\right)=\int \mathrm{d} x\left[1+\frac{2 G M}{\left(x^{2}+y^{2}+z^{2}\right)^{\frac{1}{2}}}\right] . \tag{8.39}
\end{equation*}
For a path from x=0x=0 to x=x_(0)x=x_{0}, we can integrate ^(16){ }^{16} to find, for small b//x_(0)b / x_{0},
{:(8.40)t~~x_(0)+2GM ln((2|x_(0)|)/(b))",":}\begin{equation*}
t \approx x_{0}+2 G M \ln \frac{2\left|x_{0}\right|}{b}, \tag{8.40}
\end{equation*}
to leading order. The time taken by the light pulse is the usual time t=x_(0)//ct=x_{0} / c plus a relativistic delay, caused by the gravitational field of the Sun. For the geometry in the figure, in which the delay for each leg of the journey adds, we find a total relativistic time delay of
{:(8.41)Delta t=2GM ln((4d_(p)d_(e))/(b^(2))):}\begin{equation*}
\Delta t=2 G M \ln \frac{4 d_{\mathrm{p}} d_{\mathrm{e}}}{b^{2}} \tag{8.41}
\end{equation*}
Chapter summary
Geodesics are the paths that free particles fall along in general relativity. They can be found by extremizing the path between two events using the calculus of variations, which is much simplified if length parametrization is used. Free massive particles follow timelike geodesics.
relates the acceleration to the geometry and any external forces. The term that includes the connection coefficients gives rise to inertial forces. When f^(mu)=0f^{\mu}=0 the equation describes a geodesic.
Light travels along null geodesics and this is easiest to analyse using the null condition ds^(2)=0\mathrm{d} s^{2}=0.
Irwin Shapiro (1929- ) suggested this as a fourth test of general relativity. The other three so-called classical solarsystem tests of general relativity are: (i) the perihelion precession of Mercury; (ii) the deflection of light by the Sun; (both described in Chapter 23) (iii) the Gravitational redshift of light (Chapter 13). ^(15){ }^{15} Restoring factors of cc this is the limit of small GM//c^(2)rG M / c^{2} r. ^(16){ }^{16} Use the result int(dx)/(sqrt(x^(2)+a^(2)))=ln(x+sqrt(x^(2)+a^(2)))\int \frac{\mathrm{d} x}{\sqrt{x^{2}+a^{2}}}=\ln \left(x+\sqrt{x^{2}+a^{2}}\right). d_(p)d_{\mathrm{p}} p^(" in ")\stackrel{\text { in }}{\mathrm{p}}
Fig. 8.6 The geometry for the Shapiro time delay, showing the planet (p), and the Earth (e), with the Sun (S) at the origin.
Exercises
(8.1) We will find the shortest distance between two points in flat space, expressed in cylindrical polar
coordinates, where interval is written as
{:(8.47)a tan(theta-theta_(0))=lambda:}\begin{equation*}
a \tan \left(\theta-\theta_{0}\right)=\lambda \tag{8.47}
\end{equation*}
(8.2) Despite its complicated appearance, the solution in the previous problem does represent a straight line expressed in cylindrical coordinates. We can show this by referring to Fig. 8.7, which shows how aa and theta_(0)\theta_{0} should be interpreted.
(a) Eliminate lambda\lambda to show
Fig. 8.7 The geometry of a straight line in polar coordinates.
(b) In Cartesian coordinates, a straight line can be written as alpha x+beta y=gamma\alpha x+\beta y=\gamma. Using the substitutions
show that eqn 8.48 is, indeed, a description of a straight line.
(8.3) Suppose we did not know about spacetime curvature nor the details of geometry. We would still need to use the geodesic equation, as we shall demonstrate. Consider a particle which is accelerating. In a frame that moves along with the particle, its position is xi^(mu)\xi^{\mu}, the particle can't feel its own weight, which is to say that no forces act on it and it undergoes no acceleration. As a result, in this frame, d^(2)xi^(mu)//dtau^(2)=0d^{2} \xi^{\mu} / d \tau^{2}=0. Now consider how this particle's trajectory appears in some other frame with coordinates x^(nu)x^{\nu}. By using the chain rule, show
and interpret this equation.
(8.4) The covariant derivative of a 1-form tilde(sigma)\tilde{\boldsymbol{\sigma}} will be discussed in Part V. For now we can simply note that it can be written as
(a) Use this to compute a geodesic equation for a velocity 1 -form and hence for acceleration x^(¨)_(mu)\ddot{x}_{\mu}.
(b) Use the result of part (a) to prove that the equation of motion in eqn 8.33 for massive particles has the property that f*u=0\boldsymbol{f} \cdot \boldsymbol{u}=0, for a force f\boldsymbol{f} and instantaneous velocity u\boldsymbol{u}.
Geodesic equations and connection coefficients
I know now that if I break my neck by falling off a cliff, my death is not to be blamed on the force of gravity (what does not exist is necessarily guiltless), but on the fact that I did not maintain the first curvature of my world-line, exchanging its security for a dangerous geodesic.
John Lighton Synge (1897-1995)
Synge's words, quoted above, remind us that our new geometric perspective on gravity motivates us to think about the curvature of our world line. ^(1){ }^{1} In the last two chapters, we saw that how basis vectors change as we move through spacetime is reflected in the connection coefficients Gamma^(mu)_(alpha beta)\Gamma^{\mu}{ }_{\alpha \beta} which feature in the geodesic equation, which is the equation of motion for a particle in free fall. The connection coefficients are important, not only for the role they play in the geodesic equation, but because they tell us about the curvature of spacetime itself. In this chapter, we describe a method to extract the connection coefficients. As shown in Fig. 9.1, the idea is to input the metric and outputs the connection. The most important point of this chapter is the following. The metric field of spacetime, via the line element ds^(2)=g_(mu nu)dx^(mu)dx^(nu)\mathrm{d} s^{2}=g_{\mu \nu} \mathrm{d} x^{\mu} \mathrm{d} x^{\nu}, generates the geodesics that freely falling particles follow and, therefore, the connection coefficients.
9.1 Finding connection coefficients
Let's formulate our method. The interval between spacetime points aa and bb can we written as the integral ^(2){ }^{2}
We use the Euler-Lagrange equations on the integrand to find the equations of motion. We saw in the last chapter that our expressions can be simplified by choosing length parametrization after the first set of derivatives have been taken and this is necessary to interpret ss as the interval between spacetime points. ^(3){ }^{3} The equations of motion can be written in the form of the geodesic equation
9.1 Finding connection coefficients 101
9.2 The geodesic equation from the action ^(1)A{ }^{1} \mathrm{~A} world line which, we very much hope, will avoid any cliff falls.
g Longrightarrowds^(2)Longrightarrow Gamma\boldsymbol{g} \Longrightarrow \mathrm{d} s^{2} \Longrightarrow \Gamma
Fig. 9.1 The metric generates the connection coefficients. ^(2){ }^{2} For spacelike curves we have ds^(2) > 0\mathrm{d} s^{2}>0, and we can write the interval as Delta l=int_(a)^(b)dlambda(g_(mu nu)(dx^(mu))/(dlambda)*(dx^(nu))/(dlambda))^((1)/(2))\Delta l=\int_{a}^{b} \mathrm{~d} \lambda\left(g_{\mu \nu} \frac{\mathrm{d} x^{\mu}}{\mathrm{d} \lambda} \cdot \frac{\mathrm{d} x^{\nu}}{\mathrm{d} \lambda}\right)^{\frac{1}{2}}. .(9.1)
This is the proper length along the curve. Massive particles traverse timelike curves which have ds^(2) < 0\mathrm{d} s^{2}<0. We can write the timelike interval
This equation gives us the proper time that elapses for an observer travelling along the world line. ^(3){ }^{3} We described the need for this in the previous chapter. For massive particles, which follow timelike geodesics, length parametrization, involving setting lambda=tau\lambda=\tau, is also needed to ensure that the velocity vector u\boldsymbol{u}, which is the tangent of the particle's world line, is constrained according to u*u=-1\boldsymbol{u} \cdot \boldsymbol{u}=-1. ↷\curvearrowright The rest of this chapter goes through in detail how to extract connection coefficients and calculate geodesics. A reader impatient to get on to the heart of general relativity can skip the rest of this chapter on first reading. ^(4){ }^{4} These geometries reappear in severa of the exercises and examples later in the book.
Tips for these calculations:
To save on writing, in step I it's sometimes useful to write (dx^(mu))/(dlambda)\frac{\mathrm{d} x^{\mu}}{\mathrm{d} \lambda} as x^(˙)^(mu)\dot{x}^{\mu}.
In step IV, we often employ the chain rule, noting that (d)/(dlambda)f(x^(mu))=\frac{\mathrm{d}}{\mathrm{d} \lambda} f\left(x^{\mu}\right)=(del f(x^(mu)))/(delx^(mu))(dx^(mu))/(dlambda)\frac{\partial f\left(x^{\mu}\right)}{\partial x^{\mu}} \frac{\mathrm{d} x^{\mu}}{\mathrm{d} \lambda}.
In step V, equations of motion of the form x^(¨)^(mu)+2Fx^(˙)^(alpha)x^(beta)=0\ddot{x}^{\mu}+2 F \dot{x}^{\alpha} x^{\beta}=0 for alpha!=beta\alpha \neq \beta yield Gamma^(mu)_(alpha beta)=F\Gamma^{\mu}{ }_{\alpha \beta}=F, owing to the summation Gamma_(alpha beta)^(mu)=F\Gamma_{\alpha \beta}^{\mu}=F, owing to the summatio
convention in the geodesic equation. convention in the geodesic equation.
■ The length parametrization condi
The length parametrization condi-
tion L=1L=1 is often a useful, additional tion L=1L=1 is often a useful, additional
constraint when solving the equations of motion.
By comparing the equations of motion we can simply read off the connection coefficients.
The method formalizes the method used in the examples in the previous chapter. It can be summarized as follows:
Step I: From the metric line element ds^(2)\mathrm{d} s^{2}, write a parametrized expression for the spacetime interval s=int Ldlambdas=\int L \mathrm{~d} \lambda, where lambda\lambda is the parameter. Step II: Calculate (del L)/(del(((d)x^(mu))/(dlambda)))\frac{\partial L}{\partial\left(\frac{\mathrm{~d} x^{\mu}}{\mathrm{d} \lambda}\right)} and (del L)/(delx^(mu))\frac{\partial L}{\partial x^{\mu}}.
Step III: Choose length parametrization such that L=1L=1.
Step IV: Calculate (d)/(dlambda)(del L)/(del(((d)x^(mu))/(dlambda)))\frac{\mathrm{d}}{\mathrm{d} \lambda} \frac{\partial L}{\partial\left(\frac{\mathrm{~d} x^{\mu}}{\mathrm{d} \lambda}\right)} and insert the values into the E-L equations.
Step V: Read off the connection coefficients, remembering that Gamma_(alpha beta)^(mu)=Gamma_(beta alpha)^(mu)\Gamma_{\alpha \beta}^{\mu}=\Gamma_{\beta \alpha}^{\mu}.
We shall work through a number of examples demonstrating how to extract connection coefficients. In each case, we start with a metric and end with connection coefficients. ^(4){ }^{4}
Example 9.1
Let's try the two-dimensional space on the surface of a unit sphere. The interval (step I) is
Therefore, the metric has components g_(theta theta)=1g_{\theta \theta}=1 and g_(phi phi)=sin^(2)thetag_{\phi \phi}=\sin ^{2} \theta. We saw [in Exercise 8.1 from the last chapter] that (following steps II-IV) the equations of motion are
We can examine different curved surfaces, such as the parabolic space of the next example.
Example 9.2
A parabolic surface with line element ds^(2)=(1+a^(2)r^(2))dr^(2)+r^(2)dtheta^(2)\mathrm{d} s^{2}=\left(1+a^{2} r^{2}\right) \mathrm{d} r^{2}+r^{2} \mathrm{~d} \theta^{2} has interval (step I)
As expected, these reduce down to the flat-plane connection coefficients in the case that a=0a=0.
We can examine more exotic spaces still, such as the interesting Poincaré half plane. ^(5){ }^{5} ^(5){ }^{5} Henri Poincaré (1854-1912). The Poincaré half plane provides a model of hyperbolic geometry. See Exercise 9.8 for an introduction to the Poincaré half plane and Chapters 16 and 19 for more discussion of hyperbolic spaces.
The Poincaré half plane has a metric
which is defined for r > 0r>0. The interval (step I) is
The geodesics themselves turn out to be circular arcs and are examined in Exercise 9.8.
9.2 The geodesic equation from the action
So far we have looked at a selection of special cases, evaluating spacelike ^(6){ }^{6} Lots more examples can be found in the exercises. ^(7){ }^{7} Although the resulting expression is useful, it is often quicker in practice to use the five-point method to extract the coefficients directly from the action. intervals for space-only metrics with a ( +++ ) signature. ^(6){ }^{6} However, using the Euler-Lagrange equations, we should be able to derive the equation of motion for a massive particle in a general spacetime, once and for all, from the action
This action is proportional to the proper time interval along a world line parametrized by the proper time tau\tau. We can therefore extremize the action using the same procedure as before. We already know what the answer must be: the geodesic equation from the previous chapter. However, this procedure will also provide a useful and simple formula for extracting the connection coefficients directly from the metric. ^(7){ }^{7}
We define the all-down-index connection coefficients Gamma_(lambda mu sigma)=g_(rho lambda)Gamma^(rho)_(mu sigma)\Gamma_{\lambda \mu \sigma}=g_{\rho \lambda} \Gamma^{\rho}{ }_{\mu \sigma}, and then we have the following. ^(8){ }^{8}
The conclusion of this lengthy exercise ^(9){ }^{9} is that, given only the metric, we can work out connection coefficients and have access to the equation of motion of the freely falling particle. Note, however, that since the geodesic equation applies beyond the timelike geodesics followed by massive particles, eqn 9.29 is a general, geometrical expression linking the metric with the connection coefficients.
This corresponds to a metric with components g_(tt)=-1//t^(2)g_{t t}=-1 / t^{2} and g_(xx)=1//t^(2)g_{x x}=1 / t^{2}. The connection coefficients can be calculated using eqn 9.28 , to yield
Gamma_(ttt)=(1)/(t^(3)),quadGamma_(xxt)=Gamma_(xtx)=-Gamma_(txx)=-(1)/(t^(3)).\Gamma_{t t t}=\frac{1}{t^{3}}, \quad \Gamma_{x x t}=\Gamma_{x t x}=-\Gamma_{t x x}=-\frac{1}{t^{3}} .
Using g^(tt)=-t^(2)g^{t t}=-t^{2} and g^(xx)=t^(2)g^{x x}=t^{2}, we have Gamma^(t)_(tt)=Gamma^(t)_(xx)=Gamma^(x)_(xt)=-1//t\Gamma^{t}{ }_{t t}=\Gamma^{t}{ }_{x x}=\Gamma^{x}{ }_{x t}=-1 / t. ^(8){ }^{8} In comma notation
Here mu!=nu!=lambda\mu \neq \nu \neq \lambda and we don't sum over repeated indices.
Chapter summary
The connection coefficients may be extracted using a simple routine based on extremizing the action.
The metric leads directly to the connection coefficients.
Exercises
(9.1) Using the methods described in the chapter, extract (9.3) Consider the non-diagonal metric the connection coefficients for two-dimensional plane polar coordinates.
(9.2) The torus has a line element
ds^(2)=(c+a cos v)^(2)du^(2)+a^(2)dv^(2)d s^{2}=(c+a \cos v)^{2} d u^{2}+a^{2} d v^{2}
Show that we obtain the connection coefficients
{:(9.36)Gamma_(uu)^(v)=(sin v)/(u)(c+a cos v)","quadGamma_(uv)^(u)=-(a sin v)/((c+a cos v)).:}\begin{equation*}
\Gamma_{u u}^{v}=\frac{\sin v}{u}(c+a \cos v), \quad \Gamma_{u v}^{u}=-\frac{a \sin v}{(c+a \cos v)} . \tag{9.36}
\end{equation*}
{:(9.37)ds^(2)=du^(2)+dv^(2)+2dudv cos theta(u","v):}\begin{equation*}
\mathrm{d} s^{2}=\mathrm{d} u^{2}+\mathrm{d} v^{2}+2 \mathrm{~d} u \mathrm{~d} v \cos \theta(u, v) \tag{9.37}
\end{equation*}
Show that the non-zero connection coefficients are given by
and extract the connection coefficients.
(b) We can use the physical interpretation of the length parametrization to allow some insight into this spacetime. Define the velocity in the xx direction as v=dx//dtv=\mathrm{d} x / \mathrm{d} t and show that
Now suppose that the particle does not have a velocity in the xx-direction, so that v=0v=0. The equation of motion says that in order to have v=0v=0 in this spacetime we must have an observer undergoing a uniform acceleration x^(¨)=-1//x\ddot{x}=-1 / x, which diverges as x rarr0x \rightarrow 0. We shall see this property again when we examine the spherically symmetric Schwarzschild geometry.
(9.5) Consider the rotating-frame line element
{:[ds^(2)=-[1-Omega^(2)(x^(2)+y^(2))]dt^(2)+dx^(2)+dy^(2)],[(9.42)+dz^(2)-2Omega ydxdt+2Omega xdydt]:}\begin{align*}
\mathrm{d} s^{2} & =-\left[1-\Omega^{2}\left(x^{2}+y^{2}\right)\right] \mathrm{d} t^{2}+\mathrm{d} x^{2}+\mathrm{d} y^{2} \\
& +\mathrm{d} z^{2}-2 \Omega y \mathrm{~d} x \mathrm{~d} t+2 \Omega x \mathrm{~d} y \mathrm{~d} t \tag{9.42}
\end{align*}
(a) Find the matrix g^(mu nu)g^{\mu \nu}.
(b) Compute the connection coefficients for the space described by this line element.
(9.6) (a) Express the line element from the previous question in cylindrical polars.
(b) Compute the connection coefficients in cylindrical polars.
(9.7) The Schwarzschild metric gives an interval
where Phi\Phi and Lambda\Lambda are functions of rr. Find the connection coefficients.
(9.8) We can start to understand the space represented by the Poincaré half plane in Example 9.3 by computing its geodesics. The geometry is defined by its metric in the upper half plane, r > 0r>0, only.
(a) Consider the equation of motion x^(¨)-2r^(˙)x^(˙)//r=0\ddot{x}-2 \dot{r} \dot{x} / r=0, where the dot indicates a derivative with respect to
the affine parameter lambda\lambda. Show that this equation is solved by
where r=a sin tr=a \sin t.
(c) Use these results to show that xx is given by
{:(9.46)x=-a cos t+x_(0):}\begin{equation*}
x=-a \cos t+x_{0} \tag{9.46}
\end{equation*}
where x_(0)x_{0} is a constant offset.
(d) Argue that this shows that the geodesics are circular arcs, centred on (x,r)=(x_(0),0)(x, r)=\left(x_{0}, 0\right) with radius aa, as shown in Fig. 9.2.
(e) Compute the length intdlambda\int \mathrm{d} \lambda of a geodesic starting at t=at=a and finishing at t=bt=b. Use this to show that the length of a geodesic starting at t=0t=0 and ending at t=pit=\pi is infinite.
You can see why this is the case by considering a ruler of interval length Delta s\Delta s parallel to the xx axis. Since rr is constant we have Delta s=Delta x//r\Delta s=\Delta x / r. As a result, rulers of an equivalent interval length Delta s\Delta s must have larger coordinate length Delta x\Delta x if they are at a larger height rr, as shown in Fig. 9.2.
Fig. 9.2 The Poincaré half plane from Exercise 9.8 and Example 9.3. An example geodesic is shown on the right. On the left several lines of equivalent interval length Delta s\Delta s are shown.
(9.9) We end up with the same geodesics if we extremize L=sqrt(-g_(mu nu)x^(˙)^(mu)x^(˙)^(nu))L=\sqrt{-g_{\mu \nu} \dot{x}^{\mu} \dot{x}^{\nu}}, and if we extremize L=(1)/(2)g_(mu nu)x^(˙)^(mu)x^(˙)^(nu)L=\frac{1}{2} g_{\mu \nu} \dot{x}^{\mu} \dot{x}^{\nu}. Show this by finding the EulerLagrange equation for a function
where x^(˙)^(mu)=(dx^(mu))/(dlambda),lambda\dot{x}^{\mu}=\frac{\mathrm{d} x^{\mu}}{\mathrm{d} \lambda}, \lambda is the proper length and FF is any monotonic function.
This means we can equally well use a Lagrangian L=(1)/(2)g_(mu nu)x^(˙)^(mu)x^(˙)^(nu)L=\frac{1}{2} g_{\mu \nu} \dot{x}^{\mu} \dot{x}^{\nu} which resembles the kinetic energy of a non-relativistic particle.
(9.10) Consider the moving-coordinate metric ds^(2)=-(1-v^(2))dt^(2)+dx^(2)+dy^(2)+dz^(2)-2vdxdt\mathrm{d} s^{2}=-\left(1-v^{2}\right) \mathrm{d} t^{2}+\mathrm{d} x^{2}+\mathrm{d} y^{2}+\mathrm{d} z^{2}-2 v \mathrm{~d} x \mathrm{~d} t. (9.48)
By extremizing the world line, show that the geodesics are straight lines.
(9.11) Consider a two-dimensional space with metric line element
By computing the acceleration, determine whether the curve r(lambda)=(3lambda//2)^(2//3)-1,phi(lambda)=0r(\lambda)=(3 \lambda / 2)^{2 / 3}-1, \phi(\lambda)=0, with lambda\lambda an affine parameter, is a geodesic.
10
10.1 Observers and their observations 108 10.2 Coordinate and noncoordinate bases 110 10.3 The orthonormal frame 114 10.4 Freely falling frames 116 Chapter summary 118 Exercises ^(1){ }^{1} Or, perhaps more memorably, in the words of the Time Traveller, 'There is no difference between Time and any of the three dimensions of Space except that our consciousness moves along it. But some foolish people have got hold of the wrong side of that idea.' H. G. Wells (1866-1946) The Time Machine.
Fig. 10.1 A measurement of momentum p\boldsymbol{p}. In this figure, we use the familiar hat(x)\hat{x} and hat(y)\hat{y} axes of two-dimensional space (with basis vectors e_( hat(x))\boldsymbol{e}_{\hat{x}} and e_( hat(y))\boldsymbol{e}_{\hat{y}} ), but the idea carries over to the fourdimensional spacetime axes of an orthonormal frame with basis vectors e_( hat(0)),e_( hat(1)),e_( hat(2))\boldsymbol{e}_{\hat{0}}, \boldsymbol{e}_{\hat{1}}, \boldsymbol{e}_{\hat{2}} and e_( hat(3))\boldsymbol{e}_{\hat{3}}.
Making measurements in relativity
Abstract
Why were another seven years required for the construction of the general theory of relativity? The main reason lies in the fact that it is not easy to free oneself from the idea that coordinates must have an immediate metrical meaning. Einstein quoted in P. A. Schilpp (ed.) Albert Einstein - Philosopher Scientist (1969).
The stage on which the drama of general relativity is played out is curved spacetime. However, measurements are made locally by observers in laboratories. Over the small distances involved in a typical experiment, an observer will experience spacetime as if it were flat spacetime with the Minkowski metric. We therefore need to know how to relate the observations made by observers in their local spacetime to the objects we manipulate in the curved spacetime of general relativity. The key is that measurements are made in local orthonormal frames: frames of reference set up by observers where the basis vectors are usually orthogonal and normalized. In this book, component labels in such frames will be given with hats, so that the basis vectors, for example, will be written as e_( hat(alpha))\boldsymbol{e}_{\hat{\alpha}}. By definition, the metric of the flat, local frame is simply the Minkowski metric, with components eta_( hat(mu) hat(nu))=e_( hat(mu))*e_( hat(nu))=diag(-1,1,1,1)\eta_{\hat{\mu} \hat{\nu}}=\boldsymbol{e}_{\hat{\mu}} \cdot \boldsymbol{e}_{\hat{\nu}}=\operatorname{diag}(-1,1,1,1).
A further point to note in these discussions stems from Einstein's observation in the quotation above. When presented with a tt coordinate or an rr coordinate, it is tempting to assume that tt must represent time and rr the radius. This is not correct. Coordinates are intrinsically meaningless labels which are only given meaning by relating them to measurements and intervals determined by observers in their local, inertial frames of reference. This can be summed up using the slogan that coordinates have no immediate metrical significance. ^(1){ }^{1}
10.1 Observers and their observations
A particle passes through a laboratory as shown in Fig. 10.1. The particle has a momentum p\boldsymbol{p} which, expressed as a vector, is a quantity independent of any set of coordinates. An observer in the laboratory makes measurements by carrying around their own orthonormal axes e_( hat(0)),e_( hat(1)),e_( hat(2))\boldsymbol{e}_{\hat{0}}, \boldsymbol{e}_{\hat{1}}, \boldsymbol{e}_{\hat{2}} and e_( hat(3))\boldsymbol{e}_{\hat{3}}. Referred to these axes, the vector can be expressed in coordinates as p=p^( hat(mu))e_( hat(mu))\boldsymbol{p}=p^{\hat{\mu}} \boldsymbol{e}_{\hat{\mu}}. To make a measurement of a particular
component of a vector the observer projects out the component using their local axes. For example, measuring the momentum along the hat(alpha)\hat{\alpha} direction means that the observer makes the projection via a dot product p*e_( hat(alpha))\boldsymbol{p} \cdot \boldsymbol{e}_{\hat{\alpha}}.
showing that the observer has access to the component p_( hat(alpha))p_{\hat{\alpha}}. The observer can use eta^( hat(alpha) hat(beta))\eta^{\hat{\alpha} \hat{\beta}} to raise the index, if they want the up-index form of the component via p^( hat(beta))=eta^( hat(alpha) hat(beta))p_( hat(alpha))p^{\hat{\beta}}=\eta^{\hat{\alpha} \hat{\beta}} p_{\hat{\alpha}}.
If we spot an observer, how do we know what their local orthonormal axes will look like? That is, how will they orient their orthonormal coordinates? Start by noting that the observer's world line is characterized by their velocity vector u_(obs)\boldsymbol{u}_{\mathrm{obs}}, which is tangent to the world line (Fig. 10.2). The key is that the timelike vector of the local basis e_( hat(0))\boldsymbol{e}_{\hat{0}} will also be tangent to the observer's world line, since this is the direction that a clock at rest in the observer's frame moves in spacetime. We therefore have
Therefore, expressed in some coordinate frame, the observer's timelike axis e_( hat(0))\boldsymbol{e}_{\hat{0}} has the components we would ascribe to their tangent vector u_("obs ")\boldsymbol{u}_{\text {obs }}. We write these components of the observer's timelike basis vector
The other components of the observer's orthonormal system can then be picked out, subject to being orthogonal to e_( hat(0))\boldsymbol{e}_{\hat{0}} and to each other.
Example 10.2
Consider the constantly accelerated observer in Minkowski space from Chapter 2. They have a timelike basis vector with components
Pick e_(2)\boldsymbol{e}_{2} and e_(3)\boldsymbol{e}_{3} to point along the yy - and zz-directions. The remaining 4 -vector e_( hat(1))\boldsymbol{e}_{\hat{1}} has the form (f(tau),g(tau),0,0)(f(\tau), g(\tau), 0,0). We require orthogonality of the observer's basis vectors, which is to say that
Fig. 10.2 Local orthonormal frames picked out along a world line by setting e_( hat(0))=u\boldsymbol{e}_{\hat{0}}=\boldsymbol{u}. ^(2){ }^{2} We can evaluate the dot product using the Minkowski tensor in the rest frame of the particle. The result is a scalar, so is true in any frame. ^(3){ }^{3} In our units, the vector k\boldsymbol{k} has components k^(mu)=(omega,k^(x),k^(y),k^(z))k^{\mu}=\left(\omega, k^{x}, k^{y}, k^{z}\right) and omega=| vec(k)|\omega=|\vec{k}| for light. We also assume the quantum mechanical relationship E=ℏomegaE=\hbar \omega, but set ℏ=1\hbar=1. ^(4){ }^{4} This result, discussed in the book by Hartle, will be seen again in the discus sion of black holes in Chapters 26 and 27.
One particularly helpful tool is that the energy of a particle measured by an observer with velocity u_(obs)\boldsymbol{u}_{\mathrm{obs}} is given by E=-p*u_(obs)E=-\boldsymbol{p} \cdot \boldsymbol{u}_{\mathrm{obs}}. This is easily confirmed by noting that
where, in the final step we remember that E=p^( hat(0))=-p_( hat(0))E=p^{\hat{0}}=-p_{\hat{0}} because eta_( hat(0) hat(0))=-1\eta_{\hat{0} \hat{0}}=-1 in the orthonormal frame.
Example 10.3
Consider Minkowski space. In a frame where a particle is at rest, the particle has p^(mu)=(m,0,0,0)p^{\mu}=(m, 0,0,0). Relative to this frame the observer travels with constant speed vv along the xx-axis, so the 4 -velocity of the observer has components u_(obs)^(mu)=(gamma,gamma v,0,0)u_{\mathrm{obs}}^{\mu}=(\gamma, \gamma v, 0,0), which are therefore also the components of the local basis vector (e_( hat(0)))^(mu)\left(\boldsymbol{e}_{\hat{0}}\right)^{\mu}. The energy of the particle measured by the observer, when the world lines of particle and observer intersect, is ^(2){ }^{2}
So the particle has energy E=gamma mc^(2)E=\gamma m c^{2} (restoring factors of cc ) as we expect
Now consider the accelerated observer from the previous example, measuring light from a star in Minkowski space. The star gives out light at a frequency omega\omega. The wave 4 -vector of a photon reaching the observer has components ^(3)k^(mu)=(omega,omega,0,0){ }^{3} k^{\mu}=(\omega, \omega, 0,0) in the star's rest frame. The method for finding the frequency the observer measures is the same as for the energy, to which frequency is proportional. We evaluate omega=-k*u_(obs)\omega=-\boldsymbol{k} \cdot \boldsymbol{u}_{\mathrm{obs}} Computing, we find
which demonstrates that the shift in observed frequency varies exponentially. ^(4){ }^{4}
The procedure above allows us to understand what an observer will measure. However, this is complicated by the fact that the natural, orthonormal coordinate system that observers employ has an unpleasant mathematical property: it is a non-coordinate basis, as we now describe.
10.2 Coordinate and non-coordinate bases
Recall from Chapter 3, that plane polar coordinates had the property that |e_(r)|=1\left|\boldsymbol{e}_{r}\right|=1 but that |e_(theta)|=r\left|\boldsymbol{e}_{\theta}\right|=r. That is, the length of the e_(theta)\boldsymbol{e}_{\theta} basis vector is proportional to the distance away from the origin. This coordinate system was derived from the Cartesian one by expressing components as partial derivatives with respect to the Cartesian coordinates. We call such a coordinate system a coordinate basis.
We could choose to normalize e_(theta)\boldsymbol{e}_{\theta} so that we have e_( hat(theta))=e_(theta)//r\boldsymbol{e}_{\hat{\theta}}=\boldsymbol{e}_{\theta} / r, which yields an orthonormal basis set e_( hat(r))(=e_(r))\boldsymbol{e}_{\hat{r}}\left(=\boldsymbol{e}_{r}\right) and e_( hat(theta))\boldsymbol{e}_{\hat{\theta}}. Although this is the
coordinate set we would most probably want to choose when plotting the position of events in the laboratory, it does not have the property that it is derivable directly from the Cartesian coordinates. That is to say that the basis vectors e_(r)\boldsymbol{e}_{r} and e_(theta)\boldsymbol{e}_{\theta} cannot be written as an expansion in Cartesian basis vectors with prefactors given in terms of derivatives of to the Cartesian components with respect to rr and theta\theta (see eqn 3.7). We call such a basis a^(5)\mathrm{a}^{5} non-coordinate basis.
Example 10.4
Non-coordinate bases are useful for many problems. Consider, for example, the Kepler problem, which is conventionally discussed using an orthonormal basis and cylindrical coordinates. In such a frame, the velocity v=v^( hat(r))e_( hat(r))+v^( hat(theta))e_( hat(theta))\boldsymbol{v}=v^{\hat{r}} \boldsymbol{e}_{\hat{r}}+v^{\hat{\theta}} \boldsymbol{e}_{\hat{\theta}} is given by
We shall usually work in coordinate bases owing to their neat geometrical properties. However, observers work in local frames, which is where their measurements are made. These are chosen to be orthonormal and hence usually have non-coordinate bases. We therefore need to be able to transform between coordinate bases (where we do our calculations) and non-coordinate bases (where measurements are made).
The basis vectors in a coordinate basis are related to the metric via the definition
^(5){ }^{5} As described in Chapter 3, a noncoordinate basis can be identified because its basis vectors don't commute, in contrast to the basis vectors of a coordinate basis which do commute.
darr\downarrow See Chapter 20 for a dis cussion of the Kepler problem.
Fig. 10.3 The change of the vector e_( hat(r))\boldsymbol{e}_{\hat{r}} is in the direction e_( hat(theta))\boldsymbol{e}_{\hat{\theta}}, which the change in e_( hat(theta))\boldsymbol{e}_{\hat{\theta}} is in the direction -e_( hat(r))-\boldsymbol{e}_{\hat{r}}.
In order to transform between these two descriptions, we write the components of the orthonormal basis vectors in the coordinate frame as (e_( hat(beta)))^(alpha)\left(\boldsymbol{e}_{\hat{\beta}}\right)^{\alpha}, giving us an expression
Objects such as (e_( hat(beta)))^(alpha)\left(\boldsymbol{e}_{\hat{\beta}}\right)^{\alpha} are a set of matrices known as the components of ^(6){ }^{6} Which translates from German into English as many-leg. Since the vielbein with which we're concerned describes (3+1)-dimensional spacetime, it is sometimes called a vierbein ( -=4-leg\equiv 4-\mathrm{leg} ). The bracket in the notation (e_(mu))^( hat(alpha))\left(e_{\mu}\right)^{\hat{\alpha}} is really just there for aesthetic reasons to remind us that a vielbein combines information about two different sorts of coordinate systems. We could write the components e_(mu)^(alpha)\boldsymbol{e}_{\mu}^{\alpha} if we prefer. ^(7){ }^{7} One way to think of the action of a vielbein is that the coordinate frame possesses a set of global coordinates [e.g. (t,r,theta,phi)](t, r, \theta, \phi)] and that we make them local using the vielbein. ^(8){ }^{8} Our use of this notation here follows Hartle. We will record vielbein components in margin notes throughout the book. The vielbein components in this case are (e_( hat(r)))^(r)=1,quad(e_( hat(theta)))^(theta)=(1)/(r)\left(\boldsymbol{e}_{\hat{r}}\right)^{r}=1, \quad\left(\boldsymbol{e}_{\hat{\theta}}\right)^{\theta}=\frac{1}{r}, (e_(r))^( hat(r))=1,quad(e_(theta))^( hat(theta))=r\left(e_{r}\right)^{\hat{r}}=1, \quad\left(e_{\theta}\right)^{\hat{\theta}}=r
a vielbein. ^(6){ }^{6} These turn out to be very useful. ^(7){ }^{7} We can also write the components of the coordinate basis vectors in the orthonormal frame, which leads to the expression
Consider cylindrical-polar coordinates. The coordinate basis consists of vectors e_(r)e_{r} and e_(theta)\boldsymbol{e}_{\theta} and we write components (r,theta)(r, \theta). The metric is given by the line element
The off-diagonal elements of the metric are zero; while the diagonal components are g_(rr)=1g_{r r}=1 and g_(theta theta)=r^(2)g_{\theta \theta}=r^{2}. From g_(mu nu)=e_(mu)*e_(nu)g_{\mu \nu}=\boldsymbol{e}_{\mu} \cdot \boldsymbol{e}_{\nu}, the coordinate basis vectors obey
Let's first look at how vielbein notation works. Quite trivially, we can say that the components of the coordinate basis vectors in the coordinate basis are, by definition
How do we know that the orthonormal vectors we have selected are correct? The key is that they must obey the defining relationship eta_( hat(mu) hat(nu))=e_( hat(mu))*e_( hat(nu))=g_(alpha beta)(e_( hat(mu)))^(alpha)(e_( hat(nu)))^(beta)\eta_{\hat{\mu} \hat{\nu}}=\boldsymbol{e}_{\hat{\mu}} \cdot \boldsymbol{e}_{\hat{\nu}}=g_{\alpha \beta}\left(\boldsymbol{e}_{\hat{\mu}}\right)^{\alpha}\left(\boldsymbol{e}_{\hat{\nu}}\right)^{\beta} and this is quickly checked using g_(rr)=1g_{r r}=1 and g_(theta theta)=r^(2)g_{\theta \theta}=r^{2}. For example
This shows that our choice is correct, since eta_( hat(r) hat(tilde(r)))=eta_( hat(theta) hat(theta))=1\eta_{\hat{\mathrm{r}} \hat{\tilde{r}}}=\eta_{\hat{\theta} \hat{\theta}}=1. Similarly, it follows that the coordinate basis vectors in the orthonormal basis are
These must obey the defining relationship e_(mu)(x)*e_(nu)(x)=g_(mu nu)(x)\boldsymbol{e}_{\mu}(x) \cdot \boldsymbol{e}_{\nu}(x)=g_{\mu \nu}(x), which they do.
A vielbein can be presented as a matrix as we now demonstrate.
Example 10.6
Consider the metric with line element ds^(2)=dtheta^(2)+sin^(2)thetadphi^(2)\mathrm{d} s^{2}=\mathrm{d} \theta^{2}+\sin ^{2} \theta \mathrm{~d} \phi^{2}. Using the method above, we can compute the matrix representing the components of the coordinate basis vectors in the orthonormal basis ^(9){ }^{9}
Once we have the vielbein we need the general rule ^(10){ }^{10} that the vielbein components (e_( hat(mu)))^(beta)\left(\boldsymbol{e}_{\hat{\mu}}\right)^{\beta} remove a down coordinate component beta\beta and replaces it with an orthonormal component hat(mu)\hat{\mu}. It also replaces the up component p^( hat(mu))p^{\hat{\mu}} with component p^(beta)p^{\beta}. That is to say
The use of vielbein components generalizes the rule that energy measured by an observer with velocity u_("obs ")\boldsymbol{u}_{\text {obs }} is given by E=-p*u_("obs ")E=-\boldsymbol{p} \cdot \boldsymbol{u}_{\text {obs }}, where p\boldsymbol{p} is the momentum vector. This is because we always choose e_( hat(0))=u_("obs ")\boldsymbol{e}_{\hat{0}}=\boldsymbol{u}_{\text {obs }}. To prove this we write
We know from the definitions of how a vielbein works that -p_(nu)(e_( hat(0)))^(nu)=-p_( hat(0))-p_{\nu}\left(\boldsymbol{e}_{\hat{0}}\right)^{\nu}=-p_{\hat{0}}. Finally, since in the orthonormal frame, indices are manipulated with the Minkowski tensor eta\eta, we have that E=-p_( hat(0))=-eta_( hat(0) hat(0))p^(0)=p^(0)E=-p_{\hat{0}}=-\eta_{\hat{0} \hat{0}} p^{0}=p^{0}, as we require.
Vielbein components will be very useful to us. Next, we need to identify some examples of frames in which observers make their measurements. ^(9){ }^{9} The vielbein components in this case can be written as
^(10){ }^{10} Life is made easier in understanding the action of a vielbein if we also emthe action of a vielbein if we also em-
ploy our knowledge of 1-forms. Forploy our knowledge of 1 -forms. For-
mally we define the action of the vielmally we define the action of the viel-
bein on basis vectors and basis 1-forms bein on basis vectors and basis 1 -forms
via
and use the inner product (:omega^(alpha),e_(mu):)=\left\langle\boldsymbol{\omega}^{\alpha}, \boldsymbol{e}_{\mu}\right\rangle=delta^(alpha)_(mu)\delta^{\alpha}{ }_{\mu}. To remove a vector like e_(mu)\boldsymbol{e}_{\mu}, from the second term in eqn 10.34 , we note that, on taking an inner product with basis 1-form omega^(mu)\boldsymbol{\omega}^{\mu}, we have
where we've fixed the vielbein components by (:omega^(mu),e_( hat(nu)):)=(:omega^(mu),e_(alpha):)(e_( hat(nu)))^(alpha)=(e_( hat(nu)))^(mu)\left\langle\boldsymbol{\omega}^{\mu}, \boldsymbol{e}_{\hat{\nu}}\right\rangle=\left\langle\boldsymbol{\omega}^{\mu}, \boldsymbol{e}_{\alpha}\right\rangle\left(\boldsymbol{e}_{\hat{\nu}}\right)^{\alpha}=\left(\boldsymbol{e}_{\hat{\nu}}\right)^{\mu}. (10.36) Note here that (e_( hat(nu)))^(mu)=(:omega^(mu),e_( hat(nu)):)!=\left(\boldsymbol{e}_{\hat{\nu}}\right)^{\mu}=\left\langle\boldsymbol{\omega}^{\mu}, \boldsymbol{e}_{\hat{\nu}}\right\rangle \neqdelta^(mu)_( hat(nu))\delta^{\mu}{ }_{\hat{\nu}}, since here we're working with the basis vectors of two different coordinate systems. The same method also yields
The other relationships can be confirmed using the idea that a 1 -form can be written as bar(u)=u_(mu)omega^(mu)=u_( hat(mu))omega^( hat(mu))\overline{\boldsymbol{u}}=u_{\mu} \boldsymbol{\omega}^{\mu}=u_{\hat{\mu}} \boldsymbol{\omega}^{\hat{\mu}}.
10.3 The orthonormal frame
In many of the cases we'll consider later in the book,we shall find that a very convenient orthonormal frame in which to carry out computations, particularly of curvature,is one where the observer is at rest relative to the coordinate frame.That is,the velocity expressed in the coordinate frame is u^(mu)=dx^(mu)//dtau=(u^(0),0,0,0)u^{\mu}=\mathrm{d} x^{\mu} / \mathrm{d} \tau=\left(u^{0}, 0,0,0\right) ,with u^(0)u^{0} fixed such that g_(mu nu)u^(mu)u^(nu)=g_{\mu \nu} u^{\mu} u^{\nu}=g_(00)(u^(0))^(2)=-1g_{00}\left(u^{0}\right)^{2}=-1 .This observer then chooses e_( hat(t))=u\boldsymbol{e}_{\hat{t}}=\boldsymbol{u} .We will call this rather natural choice ^(11){ }^{11} the stationary orthonormal frame.In such a frame,we have a metric that looks locally like the Minkowski metric, but there is no reason to believe that the connection coefficients should vanish.^(12){ }^{12}
To find the orthonormal frame we effectively diagonalize the metric and normalize the components.That is,we are trying to solve
In the most commonly encountered case of a metric that is already diagonal,we can simply normalize the components as discussed in the next example.
Example 10.9
For an observer at rest u\boldsymbol{u} has a single non-zero component u^(0)u^{0} .Its value is given via the normalization of the velocity,by
so we must have u^(0)=(-g_(00))^(-1//2)u^{0}=\left(-g_{00}\right)^{-1 / 2} .Our rule e_( hat(0))=u\boldsymbol{e}_{\hat{0}}=\boldsymbol{u} then mandates e_( hat(0))=e_(0)//sqrt(-g_(00))\boldsymbol{e}_{\hat{0}}=\boldsymbol{e}_{0} / \sqrt{-g_{00}} . For the diagonal metric we can then pick out an orthonormal basis
We can check this works by considering the rule g_(mu nu)(e_( hat(alpha)))^(mu)(e_( hat(beta)))^(nu)=eta_( hat(alpha) hat(beta))g_{\mu \nu}\left(\boldsymbol{e}_{\hat{\alpha}}\right)^{\mu}\left(\boldsymbol{e}_{\hat{\beta}}\right)^{\nu}=\eta_{\hat{\alpha} \hat{\beta}} .If the metric g\boldsymbol{g} is diagonal then,by inspection,we have
The normalization procedure in the last example makes identifying the vielbein components for the orthonormal frame trivial for the diagonal metric. We simply normalize by writing
In this way, we can think of the vielbein components as the square roots of the metric components.
Example 10.10
As we shall see later, a spherically symmetric gravitating object of mass MM gives rise to the Schwarzschild metric with line element given by ^(14){ }^{14}
We can identify an orthonormal frame in this so-called Schwarzschild geometry, which has coordinates ordered (t,r,theta,phi)(t, r, \theta, \phi). An observer at rest in this geometry has a velocity vector u\boldsymbol{u} with components
so that u*u=g_(tt)(1-(2M)/(r))^(-1)=-1\boldsymbol{u} \cdot \boldsymbol{u}=g_{t t}\left(1-\frac{2 M}{r}\right)^{-1}=-1, as it must. We set e_(i)=u\boldsymbol{e}_{i}=\boldsymbol{u}. We then choose
It's not hard to see that these must obey the defining rules above. Alternatively, writing non-zero components of the vielbein explicitly ^(15){ }^{15}
where rho\rho is an energy density and pp is a pressure. We can express these in the orthonormal frame. So, for example, T_(theta theta)T_{\theta \theta} becomes
There are several possible orthonormal frames that can be identified. In Chapter 6, we discussed the possibility of finding locally inertial frames (LIFs) in which, in addition to the metric being identical to the Minkowski metric, the frame also has the property that the first derivatives of the components delg_(mu nu)//delx^(alpha)\partial g_{\mu \nu} / \partial x^{\alpha} vanish at the point considered. This implies that the connection coefficients Gamma^(mu)_(alpha beta)\Gamma^{\mu}{ }_{\alpha \beta} also vanish at that point.
There are a few methods for identifying LIFs, but one of the most useful is the freely falling frame. Recall that a body that is freely falling follows a (timelike) geodesic curve in spacetime. The equivalence principle tells us that a sufficiently small laboratory in free fall should not be able to detect any gravitation. As a result of this, we might expect the laboratory's coordinate system has vanishing connection coefficients. This is indeed the case.
Freely falling frames are therefore defined to possess a system of coordinates in which the connection coefficients vanish along the geodesic that describes their free fall. The frame is described via a set of orthonormal basis vectors e_( hat(alpha))(tau)\boldsymbol{e}_{\hat{\alpha}}(\tau) that we should determine in order to be able to understand the results of measurements.
Consider the geodesic of the falling observer with proper time tau\tau, which is the curve x^(mu)(tau)x^{\mu}(\tau). The observer's four velocity is u(tau)=(dx^(mu))/(dtau)*e_(mu)\boldsymbol{u}(\tau)=\frac{\mathrm{d} x^{\mu}}{\mathrm{d} \tau} \cdot \boldsymbol{e}_{\mu}. This vector is identified with the zeroth basis vector e_( hat(0))(tau)=u(tau)\boldsymbol{e}_{\hat{0}}(\tau)=\boldsymbol{u}(\tau). We can find the spatial basis vectors at some point along the geodesic by identifying a set of orthonormal vectors that are perpendicular to u\boldsymbol{u}. The basis vectors at other points could be found by parallel transporting the basis vectors along the geodesic. Since the connection coefficients vanish, the freely falling frame is then defined by ^(17){ }^{17}
for all alpha\alpha. Notice that this is automatically satisfied for e_( hat(0))=u\boldsymbol{e}_{\hat{0}}=\boldsymbol{u} by definition for a geodesic.
Example 10.12
We shall see in Chapter 22 that falling radially inwards from rest at infinity in the Schwarzschild geometry, a particle has velocity u\boldsymbol{u} with components in the coordinate frame of
We write down that e_( hat(t))=u(tau)\boldsymbol{e}_{\hat{t}}=\boldsymbol{u}(\tau). As far as it's possible to identify them, diagonal vielbein components are most simple to use. So, as in the previous examples, we write (e_( hat(theta)))^(theta)=1//r\left(\boldsymbol{e}_{\hat{\theta}}\right)^{\theta}=1 / r and (e_( hat(phi)))^(phi)=1//(r sin theta)\left(\boldsymbol{e}_{\hat{\phi}}\right)^{\phi}=1 /(r \sin \theta). For (e_( hat(r)))^(alpha)\left(\boldsymbol{e}_{\hat{r}}\right)^{\alpha} we have g_(mu nu)(e_( hat(r)))^(mu)(e_( hat(r)))^(nu)=eta_( hat(r) hat(r))g_{\mu \nu}\left(\boldsymbol{e}_{\hat{r}}\right)^{\mu}\left(\boldsymbol{e}_{\hat{r}}\right)^{\nu}=\eta_{\hat{r} \hat{r}} or
Choose (e_( hat(r)))^(r)=1\left(e_{\hat{r}}\right)^{r}=1, and conclude that the result is
{:[(e_( hat(t)))^(alpha)=(e_( hat(0)))^(alpha)=((1-2M//r)^(-1),-(2M//r)^((1)/(2)),0,0)],[(e_( hat(r)))^(alpha)=(e_( hat(1)))^(alpha)=(-(2M//r)^((1)/(2))(1-2M//r)^(-1),1,0,0)","],[(e_( hat(theta)))^(alpha)=(e_( hat(2)))^(alpha)=(0","0","1//r","0)],[(e_( hat(phi)))^(alpha)=(e_( hat(3)))^(alpha)=(0,0,0,(r sin theta)^(-1))]:}\begin{aligned}
& \left(\boldsymbol{e}_{\hat{t}}\right)^{\alpha}=\left(\boldsymbol{e}_{\hat{0}}\right)^{\alpha}=\left((1-2 M / r)^{-1},-(2 M / r)^{\frac{1}{2}}, 0,0\right) \\
& \left(\boldsymbol{e}_{\hat{r}}\right)^{\alpha}=\left(\boldsymbol{e}_{\hat{1}}\right)^{\alpha}=\left(-(2 M / r)^{\frac{1}{2}}(1-2 M / r)^{-1}, 1,0,0\right), \\
& \left(\boldsymbol{e}_{\hat{\theta}}\right)^{\alpha}=\left(\boldsymbol{e}_{\hat{2}}\right)^{\alpha}=(0,0,1 / r, 0) \\
& \left(\boldsymbol{e}_{\hat{\phi}}\right)^{\alpha}=\left(\boldsymbol{e}_{\hat{3}}\right)^{\alpha}=\left(0,0,0,(r \sin \theta)^{-1}\right)
\end{aligned}
We saw in Chapter 7, Example 7.10, that for a system in which the connection coefficients vanish, we should have Dchi//dtau=dchi//dtau\mathrm{D} \boldsymbol{\chi} / \mathrm{d} \tau=\mathrm{d} \boldsymbol{\chi} / \mathrm{d} \tau. We should check that chi\chi in a freely falling frame (defined by grad_(u)e_( hat(alpha))=0\nabla_{u} e_{\hat{\alpha}}=0 ) has this property.
Example 10.13
Note that in the freely falling frame we have the defining fact that the basis vector and basis 1-forms are parallel transported.
Our first task is to express the covariant derivative vector grad_(u)chi\nabla_{u} \boldsymbol{\chi} in the freely falling frame. To do this we use a vielbein (e_(alpha))^( hat(mu))\left(\boldsymbol{e}_{\alpha}\right)^{\hat{\mu}} to bring its components into the orthonormal freely falling system. The components are (grad_(u)chi)^( hat(mu))=(e_(alpha))^( hat(mu))(grad_(u)chi)^(alpha)\left(\nabla_{u} \boldsymbol{\chi}\right)^{\hat{\mu}}=\left(\boldsymbol{e}_{\alpha}\right)^{\hat{\mu}}\left(\nabla_{u} \boldsymbol{\chi}\right)^{\alpha}, which can be rewritten as ^(18){ }^{18}
But this is just the covariant derivative of the hat(mu)\hat{\mu} component of chi\boldsymbol{\chi}, or grad_(u)(chi^( hat(mu)))\boldsymbol{\nabla}_{u}\left(\chi^{\hat{\mu}}\right). The covariant derivative of a component of a vector is the same as the covariant derivative of some scalar function, which (as we saw in the earlier sidenote) is just the directional derivative, which is to say
At various points in the remainder of the book, we will deploy vielbein components in order to efficiently perform certain calculations. In the next chapter, we turn to the long-awaited method of determining the curvature of spacetime. ^(18){ }^{18} A useful step in seeing this is to write the vielbein components as (e_(alpha))^( hat(mu))=\left(\boldsymbol{e}_{\alpha}\right)^{\hat{\mu}}=(:omega^( hat(mu)),e_(alpha):)\left\langle\boldsymbol{\omega}^{\hat{\mu}}, \boldsymbol{e}_{\alpha}\right\rangle, so we have
Next, we note that grad_(u)(:omega^( hat(mu)),chi:)=(:grad_(u)omega^( hat(mu)),chi:)+(:omega^( hat(mu)),grad_(u)chi:)\boldsymbol{\nabla}_{\boldsymbol{u}}\left\langle\boldsymbol{\omega}^{\hat{\mu}}, \boldsymbol{\chi}\right\rangle=\left\langle\boldsymbol{\nabla}_{\boldsymbol{u}} \boldsymbol{\omega}^{\hat{\mu}}, \boldsymbol{\chi}\right\rangle+\left\langle\boldsymbol{\omega}^{\hat{\mu}}, \boldsymbol{\nabla}_{\boldsymbol{u}} \boldsymbol{\chi}\right\rangle, but grad_(u)omega^( hat(mu))=0\nabla_{u} \boldsymbol{\omega}^{\hat{\mu}}=0 by definition of the freely falling frame. So the right-hand side of eqn 10.65 becomes
Measurements in general relativity are carried out by observers who carry around an orthonormal coordinate system with basis vectors e_( hat(alpha))\boldsymbol{e}_{\hat{\alpha}}. An observer with velocity u\boldsymbol{u} has e_( hat(0))=u\boldsymbol{e}_{\hat{0}}=\boldsymbol{u}.
A vielbein, with components (e_(mu))^( hat(alpha))\left(\boldsymbol{e}_{\mu}\right)^{\hat{\alpha}}, is a matrix that allows the transformation between orthonormal frames and coordinate frames.
A particularly convenient frame is an orthonormal one where the observer is at rest relative to the coordinate frame. An alternative is the freely falling frame (defined by grad_(u)e_( hat(alpha))=0\boldsymbol{\nabla}_{\boldsymbol{u}} \boldsymbol{e}_{\hat{\alpha}}=0 for all alpha\alpha ) in which the connection coefficients also vanish.
In the (stationary) orthonormal frame, the observer's 3-velocity vanishes and so, using u*u=g_(00)(u^(0))=-1\boldsymbol{u} \cdot \boldsymbol{u}=g_{00}\left(u^{0}\right)=-1, we have u^(0)=u^{0}=(-g_(00))^(-(1)/(2))\left(-g_{00}\right)^{-\frac{1}{2}}. The observer orients their orthonormal frame with e_( hat(0))=u\boldsymbol{e}_{\hat{0}}=\boldsymbol{u}, so e_( hat(0))=(-g_(00))^(-(1)/(2))e_(0)\boldsymbol{e}_{\hat{0}}=\left(-g_{00}\right)^{-\frac{1}{2}} \boldsymbol{e}_{0}, or (e_( hat(0)))^(0)=(-g_(00))^(-(1)/(2))\left(\boldsymbol{e}_{\hat{0}}\right)^{0}=\left(-g_{00}\right)^{-\frac{1}{2}}.
In the orthonormal frame for a diagonal metric, we have (e_(0))^( hat(0))=\left(e_{0}\right)^{\hat{0}}=(-g_(00))^((1)/(2))\left(-g_{00}\right)^{\frac{1}{2}} and (e_(i))^( hat(i))=g_(ii)^((1)/(2))\left(\boldsymbol{e}_{i}\right)^{\hat{i}}=g_{i i}^{\frac{1}{2}}.
(a) Using a coordinate system (t,chi,theta,phi)(t, \chi, \theta, \phi), a vector has components in the coordinate frame if V^(mu)=V^{\mu}=(V^(t),V^(chi),V^(theta),V^(phi))\left(V^{t}, V^{\chi}, V^{\theta}, V^{\phi}\right). What are the vector's components in the orthonormal frame?
(b) A (1,2)(1,2) tensor has a non-zero component G^(theta)_(chi phi)G^{\theta}{ }_{\chi \phi}. What does this become in the orthonormal frame?
(10.2) (a) Working in the orthonormal frame, find the connection coefficients for flat space represented in cylindrical polar coordinates.
Hint: Remember that the connection coefficients do not transform like tensors, so you cannot simply use the vielbein. You can compute the coefficients directly from the definitions of the basis vectors, or transform using eqn 7.11.
(b) Show that the connection coefficients you have derived obey the rule Gamma_( hat(alpha) hat(beta))^(mu)-Gamma^(mu)_( hat(beta) hat(alpha))=\Gamma_{\hat{\alpha} \hat{\beta}}^{\mu}-\Gamma^{\mu}{ }_{\hat{\beta} \hat{\alpha}}=(:omega^( hat(mu)),[e_( hat(alpha)),e_( hat(beta))]:)\left\langle\boldsymbol{\omega}^{\hat{\mu}},\left[\boldsymbol{e}_{\hat{\alpha}}, \boldsymbol{e}_{\hat{\beta}}\right]\right\rangle.
(10.3) This problem combines several ideas from the last few chapters and is a useful warm up for some of the physics in Part IV of the book.
A particle travels radially in a static, spherically symmetric gravitational field described by diagonal metric components g_(mu nu)g_{\mu \nu} and a velocity vector u\boldsymbol{u}.
(a) If the timelike component of the particle's velocity 1 -form is given in the static frame of the potential by a constant u_(t)=au_{t}=a, give the other components in terms of aa and the components of the metric.
(b) Compute the coordinate velocity dr//dt\mathrm{d} r / \mathrm{d} t.
(c) What is the coordinate velocity, as measured by a local observer?
(10.4) Consider flat spacetime expressed in an orthogonal coordinate system (x^(1),x^(2),x^(3))\left(x^{1}, x^{2}, x^{3}\right) with a diagonal metric.
(a) Show that the gradient operator acting on a function ff becomes
What does this formula yield for (d) orthonormal cylindrical polar coordinates and (e) orthonormal spherical polar coordinates?
(10.5) A light signal is emitted by a source on the rim of a centrifuge and detected by a detector at another point on the rim, separated by an angle alpha\alpha. Use the metric for the rotating frame
to show that there is no shift in the frequency of the signal.
.6) An alternative to orthonormal local basis vectors was suggested by Newman and Penrose. They considered a pair of real null vectors ll and nn and a pair of complex-conjugate null vectors m\boldsymbol{m} and bar(m)\overline{\boldsymbol{m}} obeying
The vectors are normalized according to l*n=1\boldsymbol{l} \cdot \boldsymbol{n}=1 and m* bar(m)=-1\boldsymbol{m} \cdot \overline{\boldsymbol{m}}=-1.
(a) If we take the local basis to be
find the components of the local metric eta_( hat(mu) hat(nu))\eta_{\hat{\mu} \hat{\nu}}.
(b) Find the local basis 1 -forms omega^( hat(mu))\boldsymbol{\omega}^{\hat{\mu}}, assuming the usual relationship (:omega^( hat(mu)),e_( hat(nu)):)=delta_( hat(nu))^( hat(nu))\left\langle\boldsymbol{\omega}^{\hat{\mu}}, \boldsymbol{e}_{\hat{\nu}}\right\rangle=\delta_{\hat{\nu}}^{\hat{\nu}}.
(10.7) Suggest vielbein components for a (1+1)dimensional metric with line element ds^(2)=-dudv\mathrm{d} s^{2}=-\mathrm{d} u \mathrm{~d} v.
^(1){ }^{1} A freely falling body is one that experiences only the effects of gravity. ^(2){ }^{2} This is sometimes called the principle of weak equivalence, but we will take the view that the weakness is an take the view that the weakness is an
attribute of the principle. The corresponding strong principle of equivalence will be introduced on the following page. Why is this principle weak? Because it only applies to mechanica forces.